Qwen2. 5-coder technical report

B Hui, J Yang, Z Cui, J Yang, D Liu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its
predecessor, CodeQwen1. 5. This series includes six models: Qwen2. 5-Coder-(0.5 B/1.5 …

Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets

Z Liu, T Hoang, J Zhang, M Zhu, T Lan… - arxiv preprint arxiv …, 2024 - arxiv.org
The advancement of function-calling agent models requires diverse, reliable, and high-
quality datasets. This paper presents APIGen, an automated data generation pipeline …

Evaluating language models for efficient code generation

J Liu, S **e, J Wang, Y Wei, Y Ding, L Zhang - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Differential Performance Evaluation (DPE), a framework designed to reliably
evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding …

Agent-as-a-judge: Evaluate agents with agents

M Zhuge, C Zhao, D Ashley, W Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

S Dou, H Jia, S Wu, H Zheng, W Zhou, M Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
The increasing development of large language models (LLMs) in code generation has
drawn significant attention among researchers. To enhance LLM-based code generation …

Opencoder: The open cookbook for top-tier code large language models

S Huang, T Cheng, JK Liu, J Hao, L Song, Y Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) for code have become indispensable in various domains,
including code generation, reasoning tasks and agent systems. While open-access code …

Effibench: Benchmarking the efficiency of automatically generated code

D Huang, Y Qing, W Shang, H Cui… - arxiv preprint arxiv …, 2024 - arxiv.org
Code generation models have increasingly become integral to aiding software
development. Although current research has thoroughly examined the correctness of the …

LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead

J He, C Treude, D Lo - ACM Transactions on Software Engineering and …, 2025 - dl.acm.org
Integrating Large Language Models (LLMs) into autonomous agents marks a significant shift
in the research landscape by offering cognitive abilities that are competitive with human …

Evaluating and aligning codellms on human preference

J Yang, J Yang, K **, Y Miao, L Zhang, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Code large language models (codeLLMs) have made significant strides in code generation.
Most previous code-related benchmarks, which consist of various programming exercises …

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Z Yu, Y Zhao, A Cohan, XP Zhang - arxiv preprint arxiv:2412.21199, 2024 - arxiv.org
We introduce self-invoking code generation, a new task designed to evaluate the
progressive reasoning and problem-solving capabilities of LLMs. In this task, models are …