Reflexion: Language agents with verbal reinforcement learning

N Shinn, F Cassano, A Gopinath… - Advances in …, 2023 - proceedings.neurips.cc
Large language models (LLMs) have been increasingly used to interact with external
environments (eg, games, compilers, APIs) as goal-driven agents. However, it remains …

Wizardcoder: Empowering code large language models with evol-instruct

Z Luo, C Xu, P Zhao, Q Sun, X Geng, W Hu… - arxiv preprint arxiv …, 2023 - arxiv.org
Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated
exceptional performance in code-related tasks. However, most existing models are solely …

Livecodebench: Holistic and contamination free evaluation of large language models for code

N Jain, K Han, A Gu, WD Li, F Yan, T Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) applied to code-related applications have emerged as a
prominent field, attracting significant interest from both academia and industry. However, as …

Large language models meet nl2code: A survey

D Zan, B Chen, F Zhang, D Lu, B Wu, B Guan… - arxiv preprint arxiv …, 2022 - arxiv.org
The task of generating code from a natural language description, or NL2Code, is considered
a pressing and significant challenge in code intelligence. Thanks to the rapid development …

Qwen2. 5-coder technical report

B Hui, J Yang, Z Cui, J Yang, D Liu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its
predecessor, CodeQwen1. 5. This series includes six models: Qwen2. 5-Coder-(0.5 B/1.5 …

Codereval: A benchmark of pragmatic code generation with generative pre-trained models

H Yu, B Shen, D Ran, J Zhang, Q Zhang, Y Ma… - Proceedings of the 46th …, 2024 - dl.acm.org
Code generation models based on the pre-training and fine-tuning paradigm have been
increasingly attempted by both academia and industry, resulting in well-known industrial …

A survey on large language models for code generation

J Jiang, F Wang, J Shen, S Kim, S Kim - arxiv preprint arxiv:2406.00515, 2024 - arxiv.org
Large Language Models (LLMs) have garnered remarkable advancements across diverse
code-related tasks, known as Code LLMs, particularly in code generation that generates …

“What it wants me to say”: Bridging the abstraction gap between end-user programmers and code-generating large language models

MX Liu, A Sarkar, C Negreanu, B Zorn… - Proceedings of the …, 2023 - dl.acm.org
Code-generating large language models map natural language to code. However, only a
small portion of the infinite space of naturalistic utterances is effective at guiding code …

Codebertscore: Evaluating code generation with pretrained models of code

S Zhou, U Alon, S Agarwal, G Neubig - arxiv preprint arxiv:2302.05527, 2023 - arxiv.org
Since the rise of neural natural-language-to-code models (NL-> Code) that can generate
long expressions and statements rather than a single next-token, one of the major problems …

Cruxeval: A benchmark for code reasoning, understanding and execution

A Gu, B Rozière, H Leather, A Solar-Lezama… - arxiv preprint arxiv …, 2024 - arxiv.org
We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a
benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …