Large language models for software engineering: A systematic literature review
Large Language Models (LLMs) have significantly impacted numerous domains, including
Software Engineering (SE). Many recent publications have explored LLMs applied to …
Software Engineering (SE). Many recent publications have explored LLMs applied to …
A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges
Measuring and evaluating source code similarity is a fundamental software engineering
activity that embraces a broad range of applications, including but not limited to code …
activity that embraces a broad range of applications, including but not limited to code …
The stack: 3 tb of permissively licensed source code
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial
Intelligence (AI)--not only for natural language processing but also for code understanding …
Intelligence (AI)--not only for natural language processing but also for code understanding …
Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation
Pre-trained models for Natural Languages (NL) like BERT and GPT have been recently
shown to transfer well to Programming Languages (PL) and largely benefit a broad set of …
shown to transfer well to Programming Languages (PL) and largely benefit a broad set of …
SantaCoder: don't reach for the stars!
The BigCode project is an open-scientific collaboration working on the responsible
development of large language models for code. This tech report describes the progress of …
development of large language models for code. This tech report describes the progress of …
Efficient training of language models to fill in the middle
We show that autoregressive language models can learn to infill text after we apply a
straightforward transformation to the dataset, which simply moves a span of text from the …
straightforward transformation to the dataset, which simply moves a span of text from the …
Unsupervised translation of programming languages
A transcompiler, also known as source-to-source translator, is a system that converts source
code from a high-level programming language (such as C++ or Python) to another …
code from a high-level programming language (such as C++ or Python) to another …
Natgen: generative pre-training by “naturalizing” source code
Pre-trained Generative Language models (eg, PLBART, CodeT5, SPT-Code) for source
code yielded strong results on several tasks in the past few years, including code generation …
code yielded strong results on several tasks in the past few years, including code generation …
An empirical comparison of pre-trained models of source code
While a large number of pre-trained models of source code have been successfully
developed and applied to a variety of software engineering (SE) tasks in recent years, our …
developed and applied to a variety of software engineering (SE) tasks in recent years, our …
Natural language to code translation with execution
Generative models of code, pretrained on large corpora of programs, have shown great
success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et …
success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et …