Qwen2. 5-coder technical report
In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its
predecessor, CodeQwen1. 5. This series includes six models: Qwen2. 5-Coder-(0.5 B/1.5 …
predecessor, CodeQwen1. 5. This series includes six models: Qwen2. 5-Coder-(0.5 B/1.5 …
Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets
The advancement of function-calling agent models requires diverse, reliable, and high-
quality datasets. This paper presents APIGen, an automated data generation pipeline …
quality datasets. This paper presents APIGen, an automated data generation pipeline …
Evaluating language models for efficient code generation
We introduce Differential Performance Evaluation (DPE), a framework designed to reliably
evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding …
evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding …
Agent-as-a-judge: Evaluate agents with agents
Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
The increasing development of large language models (LLMs) in code generation has
drawn significant attention among researchers. To enhance LLM-based code generation …
drawn significant attention among researchers. To enhance LLM-based code generation …
Opencoder: The open cookbook for top-tier code large language models
Large language models (LLMs) for code have become indispensable in various domains,
including code generation, reasoning tasks and agent systems. While open-access code …
including code generation, reasoning tasks and agent systems. While open-access code …
Effibench: Benchmarking the efficiency of automatically generated code
Code generation models have increasingly become integral to aiding software
development. Although current research has thoroughly examined the correctness of the …
development. Although current research has thoroughly examined the correctness of the …
LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead
Integrating Large Language Models (LLMs) into autonomous agents marks a significant shift
in the research landscape by offering cognitive abilities that are competitive with human …
in the research landscape by offering cognitive abilities that are competitive with human …
Evaluating and aligning codellms on human preference
Code large language models (codeLLMs) have made significant strides in code generation.
Most previous code-related benchmarks, which consist of various programming exercises …
Most previous code-related benchmarks, which consist of various programming exercises …
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
We introduce self-invoking code generation, a new task designed to evaluate the
progressive reasoning and problem-solving capabilities of LLMs. In this task, models are …
progressive reasoning and problem-solving capabilities of LLMs. In this task, models are …