- Academic Search

B Hui, J Yang, Z Cui, J Yang, D Liu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its
predecessor, CodeQwen1. 5. This series includes six models: Qwen2. 5-Coder-(0.5 B/1.5 …

Zapisz Cytuj Cytowane przez 102 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Starcoder 2 and the stack v2: The next generation

A Lozhkov, R Li, LB Allal, F Cassano… - arxiv preprint arxiv …, 2024 - arxiv.org

The BigCode project, an open-scientific collaboration focused on the responsible
development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In …

Zapisz Cytuj Cytowane przez 196 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating language models for efficient code generation

J Liu, S **e, J Wang, Y Wei, Y Ding, L Zhang - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Differential Performance Evaluation (DPE), a framework designed to reliably
evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding …

Zapisz Cytuj Cytowane przez 14 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The mamba in the llama: Distilling and accelerating hybrid models

J Wang, D Paliotta, A May, AM Rush, T Dao - arxiv preprint arxiv …, 2024 - arxiv.org

Linear RNN architectures, like Mamba, can be competitive with Transformer models in
language modeling while having advantageous deployment characteristics. Given the focus …

Zapisz Cytuj Cytowane przez 11 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Codemind: A framework to challenge large language models for code reasoning

C Liu, SD Zhang, AR Ibrahimzada… - arxiv preprint arxiv …, 2024 - arxiv.org

Solely relying on test passing to evaluate Large Language Models (LLMs) for code
synthesis may result in unfair assessment or promoting models with data leakage. As an …

Zapisz Cytuj Cytowane przez 19 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mhpp: Exploring the capabilities and limitations of language models beyond basic code generation

J Dai, J Lu, Y Feng, D Huang, G Zeng, R Ruan… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in large language models (LLMs) have greatly improved code
generation, specifically at the function level. For instance, GPT-4o has achieved a 91.0 …

Zapisz Cytuj Cytowane przez 7 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating and aligning codellms on human preference

J Yang, J Yang, K **, Y Miao, L Zhang, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Code large language models (codeLLMs) have made significant strides in code generation.
Most previous code-related benchmarks, which consist of various programming exercises …

Zapisz Cytuj Cytowane przez 5 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

[PDF][PDF] SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning

Y Ding, J Peng, MJ Min, G Kaiser, J Yang… - arxiv preprint arxiv …, 2024 - openreview.net

Abstract Code Large Language Models (Code LLMs) have excelled at tasks like code
completion but often miss deeper semantics such as execution effects and dynamic states …

Zapisz Cytuj Cytowane przez 8 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

R2E: Turning any Github Repository into a Programming Agent Environment

N Jain, M Shetty, T Zhang, K Han, K Sen… - Forty-first International …, 2024 - openreview.net

While Large Language Models'(LLMs) coding capabilities have advanced rapidly,
corresponding evaluation benchmarks on real-world programming setups are yet to catch …

Zapisz Cytuj Cytowane przez 13 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Testgeneval: A real world unit test generation and test completion benchmark

K Jain, G Synnaeve, B Rozière - arxiv preprint arxiv:2410.00752, 2024 - arxiv.org

Code generation models can help improve many common software tasks ranging from code
completion to defect prediction. Most of the existing benchmarks for code generation LLMs …

Zapisz Cytuj Cytowane przez 5 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Cruxeval: A benchmark for code reasoning, understanding and execution

Qwen2. 5-coder technical report

Starcoder 2 and the stack v2: The next generation

Evaluating language models for efficient code generation

The mamba in the llama: Distilling and accelerating hybrid models

Codemind: A framework to challenge large language models for code reasoning

Mhpp: Exploring the capabilities and limitations of language models beyond basic code generation

Evaluating and aligning codellms on human preference

[PDF][PDF] SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning

R2E: Turning any Github Repository into a Programming Agent Environment

Testgeneval: A real world unit test generation and test completion benchmark