Data interpreter: An llm agent for data science
Large Language Model (LLM)-based agents have shown effectiveness across many
applications. However, their use in data science scenarios requiring solving long-term …
applications. However, their use in data science scenarios requiring solving long-term …
Automated program repair via conversation: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT
Automated Program Repair (APR) aims to automatically generate patches for buggy
programs. Traditional APR techniques suffer from a lack of patch variety as they rely heavily …
programs. Traditional APR techniques suffer from a lack of patch variety as they rely heavily …
Agent-as-a-judge: Evaluate agents with agents
Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …
Agentharm: A benchmark for measuring harmfulness of llm agents
The robustness of LLMs to jailbreak attacks, where users design prompts to circumvent
safety measures and misuse model capabilities, has been studied primarily for LLMs acting …
safety measures and misuse model capabilities, has been studied primarily for LLMs acting …
Opencoder: The open cookbook for top-tier code large language models
Large language models (LLMs) for code have become indispensable in various domains,
including code generation, reasoning tasks and agent systems. While open-access code …
including code generation, reasoning tasks and agent systems. While open-access code …
Marscode agent: Ai-native automated bug fixing
Recent advances in large language models (LLMs) have shown significant potential to
automate various software development tasks, including code completion, test generation …
automate various software development tasks, including code completion, test generation …
Specrover: Code intent extraction via llms
Autonomous program improvement typically involves automatically producing bug fixes and
feature additions. Such program improvement can be accomplished by a combination of …
feature additions. Such program improvement can be accomplished by a combination of …
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
Large Language Models (LLMs) excel in code generation yet struggle with modern AI
software engineering tasks. Unlike traditional function-level or file-level coding tasks, AI …
software engineering tasks. Unlike traditional function-level or file-level coding tasks, AI …
Autoglm: Autonomous foundation agents for guis
We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation
agents for autonomous control of digital devices through Graphical User Interfaces (GUIs) …
agents for autonomous control of digital devices through Graphical User Interfaces (GUIs) …
Diversity empowers intelligence: Integrating expertise of software engineering agents
Large language model (LLM) agents have shown great potential in solving real-world
software engineering (SWE) problems. The most advanced open-source SWE agent can …
software engineering (SWE) problems. The most advanced open-source SWE agent can …