- Academic Search

B Hui, J Yang, Z Cui, J Yang, D Liu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its
predecessor, CodeQwen1. 5. This series includes six models: Qwen2. 5-Coder-(0.5 B/1.5 …

Zapisz Cytuj Cytowane przez 92 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets

Z Liu, T Hoang, J Zhang, M Zhu, T Lan… - arxiv preprint arxiv …, 2024 - arxiv.org

The advancement of function-calling agent models requires diverse, reliable, and high-
quality datasets. This paper presents APIGen, an automated data generation pipeline …

Zapisz Cytuj Cytowane przez 24 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Evaluating language models for efficient code generation

J Liu, S **e, J Wang, Y Wei, Y Ding, L Zhang - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Differential Performance Evaluation (DPE), a framework designed to reliably
evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding …

Zapisz Cytuj Cytowane przez 13 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Agent-as-a-judge: Evaluate agents with agents

M Zhuge, C Zhao, D Ashley, W Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …

Zapisz Cytuj Cytowane przez 15 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

S Dou, H Jia, S Wu, H Zheng, W Zhou, M Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

The increasing development of large language models (LLMs) in code generation has
drawn significant attention among researchers. To enhance LLM-based code generation …

Zapisz Cytuj Cytowane przez 14 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Opencoder: The open cookbook for top-tier code large language models

S Huang, T Cheng, JK Liu, J Hao, L Song, Y Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) for code have become indispensable in various domains,
including code generation, reasoning tasks and agent systems. While open-access code …

Zapisz Cytuj Cytowane przez 11 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Effibench: Benchmarking the efficiency of automatically generated code

D Huang, Y Qing, W Shang, H Cui… - arxiv preprint arxiv …, 2024 - arxiv.org

Code generation models have increasingly become integral to aiding software
development. Although current research has thoroughly examined the correctness of the …

Zapisz Cytuj Cytowane przez 15 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] acm.org

LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead

J He, C Treude, D Lo - ACM Transactions on Software Engineering and …, 2025 - dl.acm.org

Integrating Large Language Models (LLMs) into autonomous agents marks a significant shift
in the research landscape by offering cognitive abilities that are competitive with human …

Zapisz Cytuj Powiązane artykuły

[Free GPT-4]

[PDF] arxiv.org

Evaluating and aligning codellms on human preference

J Yang, J Yang, K **, Y Miao, L Zhang, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Code large language models (codeLLMs) have made significant strides in code generation.
Most previous code-related benchmarks, which consist of various programming exercises …

Zapisz Cytuj Cytowane przez 3 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Z Yu, Y Zhao, A Cohan, XP Zhang - arxiv preprint arxiv:2412.21199, 2024 - arxiv.org

We introduce self-invoking code generation, a new task designed to evaluate the
progressive reasoning and problem-solving capabilities of LLMs. In this task, models are …

Zapisz Cytuj Cytowane przez 2 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Bigcodebench: Benchmarking code generation with diverse function calls and complex instructions

Qwen2. 5-coder technical report

Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets

Evaluating language models for efficient code generation

Agent-as-a-judge: Evaluate agents with agents

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

Opencoder: The open cookbook for top-tier code large language models

Effibench: Benchmarking the efficiency of automatically generated code

LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead

Evaluating and aligning codellms on human preference

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation