Data interpreter: An llm agent for data science

S Hong, Y Lin, B Liu, B Liu, B Wu, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Model (LLM)-based agents have shown effectiveness across many
applications. However, their use in data science scenarios requiring solving long-term …

Automated program repair via conversation: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT

CS **a, L Zhang - Proceedings of the 33rd ACM SIGSOFT International …, 2024 - dl.acm.org
Automated Program Repair (APR) aims to automatically generate patches for buggy
programs. Traditional APR techniques suffer from a lack of patch variety as they rely heavily …

Agent-as-a-judge: Evaluate agents with agents

M Zhuge, C Zhao, D Ashley, W Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …

Agentharm: A benchmark for measuring harmfulness of llm agents

M Andriushchenko, A Souly, M Dziemian… - arxiv preprint arxiv …, 2024 - arxiv.org
The robustness of LLMs to jailbreak attacks, where users design prompts to circumvent
safety measures and misuse model capabilities, has been studied primarily for LLMs acting …

Opencoder: The open cookbook for top-tier code large language models

S Huang, T Cheng, JK Liu, J Hao, L Song, Y Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) for code have become indispensable in various domains,
including code generation, reasoning tasks and agent systems. While open-access code …

Marscode agent: Ai-native automated bug fixing

Y Liu, P Gao, X Wang, J Liu, Y Shi, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in large language models (LLMs) have shown significant potential to
automate various software development tasks, including code completion, test generation …

Specrover: Code intent extraction via llms

H Ruan, Y Zhang, A Roychoudhury - arxiv preprint arxiv:2408.02232, 2024 - arxiv.org
Autonomous program improvement typically involves automatically producing bug fixes and
feature additions. Such program improvement can be accomplished by a combination of …

RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph

S Ouyang, W Yu, K Ma, Z **ao, Z Zhang, M Jia… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) excel in code generation yet struggle with modern AI
software engineering tasks. Unlike traditional function-level or file-level coding tasks, AI …

Autoglm: Autonomous foundation agents for guis

X Liu, B Qin, D Liang, G Dong, H Lai, H Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation
agents for autonomous control of digital devices through Graphical User Interfaces (GUIs) …

Diversity empowers intelligence: Integrating expertise of software engineering agents

K Zhang, W Yao, Z Liu, Y Feng, Z Liu, R Murthy… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language model (LLM) agents have shown great potential in solving real-world
software engineering (SWE) problems. The most advanced open-source SWE agent can …