Livecodebench: Holistic and contamination free evaluation of large language models for code

N Jain, K Han, A Gu, WD Li, F Yan, T Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) applied to code-related applications have emerged as a
prominent field, attracting significant interest from both academia and industry. However, as …

Cruxeval: A benchmark for code reasoning, understanding and execution

A Gu, B Rozière, H Leather, A Solar-Lezama… - arxiv preprint arxiv …, 2024 - arxiv.org
We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a
benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …

Transformers in source code generation: A comprehensive survey

H Ghaemi, Z Alizadehsani, A Shahraki… - Journal of Systems …, 2024 - Elsevier
Transformers have revolutionized natural language processing (NLP) and have had a huge
impact on automating tasks. Recently, transformers have led to the development of powerful …

Multilingual training for software engineering

T Ahmed, P Devanbu - Proceedings of the 44th International Conference …, 2022 - dl.acm.org
Well-trained machine-learning models, which leverage large amounts of open-source
software data, have now become an interesting approach to automating many software …

A catalog of data smells for coding tasks

A Vitale, R Oliveto, S Scalabrino - ACM Transactions on Software …, 2024 - dl.acm.org
Large Language Models (LLMs) are increasingly becoming fundamental in supporting
software developers in coding tasks. The massive datasets used for training LLMs are often …

Formal specifications from natural language

C Hahn, F Schmitt, JJ Tillman, N Metzger… - arxiv preprint arxiv …, 2022 - arxiv.org
We study the generalization abilities of language models when translating natural language
into formal specifications with complex semantics. In particular, we fine-tune language …

Effibench: Benchmarking the efficiency of automatically generated code

D Huang, Y Qing, W Shang, H Cui… - arxiv preprint arxiv …, 2024 - arxiv.org
Code generation models have increasingly become integral to aiding software
development. Although current research has thoroughly examined the correctness of the …

The counterfeit conundrum: Can code language models grasp the nuances of their incorrect generations?

A Gu, WD Li, N Jain, TX Olausson, C Lee, K Sen… - arxiv preprint arxiv …, 2024 - arxiv.org
While language models are increasingly more proficient at code generation, they still
frequently generate incorrect programs. Many of these programs are obviously wrong, but …

Mhpp: Exploring the capabilities and limitations of language models beyond basic code generation

J Dai, J Lu, Y Feng, D Huang, G Zeng, R Ruan… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have greatly improved code
generation, specifically at the function level. For instance, GPT-4o has achieved a 91.0 …

The vault: A comprehensive multilingual dataset for advancing code understanding and generation

DN Manh, NL Hai, ATV Dau, AM Nguyen… - arxiv preprint arxiv …, 2023 - arxiv.org
We present The Vault, a dataset of high-quality code-text pairs in multiple programming
languages for training large language models to understand and generate code. We present …