The stack: 3 tb of permissively licensed source code

D Kocetkov, R Li, LB Allal, J Li, C Mou… - arxiv preprint arxiv …, 2022 - arxiv.org
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial
Intelligence (AI)--not only for natural language processing but also for code understanding …

SantaCoder: don't reach for the stars!

LB Allal, R Li, D Kocetkov, C Mou, C Akiki… - arxiv preprint arxiv …, 2023 - arxiv.org
The BigCode project is an open-scientific collaboration working on the responsible
development of large language models for code. This tech report describes the progress of …

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arxiv preprint arxiv …, 2023 - arxiv.org
Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

MultiPL-E: a scalable and polyglot approach to benchmarking neural code generation

F Cassano, J Gouwar, D Nguyen… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Large language models have demonstrated the ability to generate both natural language
and programming language text. Although contemporary code generation models are …

Structured chain-of-thought prompting for code generation

J Li, G Li, Y Li, Z ** - ACM Transactions on Software Engineering and …, 2025 - dl.acm.org
Large Language Models (LLMs) have shown impressive abilities in code generation. Chain-
of-Thought (CoT) prompting is the state-of-the-art approach to utilizing LLMs. CoT prompting …

Exploring parameter-efficient fine-tuning techniques for code generation with large language models

M Weyssow, X Zhou, K Kim, D Lo… - ACM Transactions on …, 2023 - dl.acm.org
Large language models (LLMs) demonstrate impressive capabilities to generate accurate
code snippets given natural language intents in a zero-shot manner, ie, without the need for …

Crosscodeeval: A diverse and multilingual benchmark for cross-file code completion

Y Ding, Z Wang, W Ahmad, H Ding… - Advances in …, 2024 - proceedings.neurips.cc
Code completion models have made significant progress in recent years, yet current popular
evaluation datasets, such as HumanEval and MBPP, predominantly focus on code …

Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation

X Du, M Liu, K Wang, H Wang, J Liu, Y Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
In this work, we make the first attempt to evaluate LLMs in a more challenging code
generation scenario, ie class-level code generation. We first manually construct the first …

[PDF][PDF] Exploring the effectiveness of large language models in generating unit tests

ML Siddiq, J Santos, RH Tanvir, N Ulfat… - arxiv preprint arxiv …, 2023 - researchgate.net
A code generation model generates code by taking a prompt from a code comment, existing
code, or a combination of both. Although code generation models (eg, GitHub Copilot) are …

A survey of large language models for code: Evolution, benchmarking, and future trends

Z Zheng, K Ning, Y Wang, J Zhang, D Zheng… - arxiv preprint arxiv …, 2023 - arxiv.org
General large language models (LLMs), represented by ChatGPT, have demonstrated
significant potential in tasks such as code generation in software engineering. This has led …