Qwen2. 5-coder technical report

B Hui, J Yang, Z Cui, J Yang, D Liu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its
predecessor, CodeQwen1. 5. This series includes six models: Qwen2. 5-Coder-(0.5 B/1.5 …

Starcoder 2 and the stack v2: The next generation

A Lozhkov, R Li, LB Allal, F Cassano… - arxiv preprint arxiv …, 2024 - arxiv.org
The BigCode project, an open-scientific collaboration focused on the responsible
development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In …

Evaluating language models for efficient code generation

J Liu, S **e, J Wang, Y Wei, Y Ding, L Zhang - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Differential Performance Evaluation (DPE), a framework designed to reliably
evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding …

The mamba in the llama: Distilling and accelerating hybrid models

J Wang, D Paliotta, A May, AM Rush, T Dao - arxiv preprint arxiv …, 2024 - arxiv.org
Linear RNN architectures, like Mamba, can be competitive with Transformer models in
language modeling while having advantageous deployment characteristics. Given the focus …

Codemind: A framework to challenge large language models for code reasoning

C Liu, SD Zhang, AR Ibrahimzada… - arxiv preprint arxiv …, 2024 - arxiv.org
Solely relying on test passing to evaluate Large Language Models (LLMs) for code
synthesis may result in unfair assessment or promoting models with data leakage. As an …

Mhpp: Exploring the capabilities and limitations of language models beyond basic code generation

J Dai, J Lu, Y Feng, D Huang, G Zeng, R Ruan… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have greatly improved code
generation, specifically at the function level. For instance, GPT-4o has achieved a 91.0 …

Evaluating and aligning codellms on human preference

J Yang, J Yang, K **, Y Miao, L Zhang, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Code large language models (codeLLMs) have made significant strides in code generation.
Most previous code-related benchmarks, which consist of various programming exercises …

[PDF][PDF] SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning

Y Ding, J Peng, MJ Min, G Kaiser, J Yang… - arxiv preprint arxiv …, 2024 - openreview.net
Abstract Code Large Language Models (Code LLMs) have excelled at tasks like code
completion but often miss deeper semantics such as execution effects and dynamic states …

R2E: Turning any Github Repository into a Programming Agent Environment

N Jain, M Shetty, T Zhang, K Han, K Sen… - Forty-first International …, 2024 - openreview.net
While Large Language Models'(LLMs) coding capabilities have advanced rapidly,
corresponding evaluation benchmarks on real-world programming setups are yet to catch …

Testgeneval: A real world unit test generation and test completion benchmark

K Jain, G Synnaeve, B Rozière - arxiv preprint arxiv:2410.00752, 2024 - arxiv.org
Code generation models can help improve many common software tasks ranging from code
completion to defect prediction. Most of the existing benchmarks for code generation LLMs …