The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025‏ - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions

PS Park, S Goldstein, A O'Gara, M Chen, D Hendrycks - Patterns, 2024‏ - cell.com
This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …

Augmented language models: a survey

G Mialon, R Dessì, M Lomeli, C Nalmpantis… - arxiv preprint arxiv …, 2023‏ - arxiv.org
This survey reviews works in which language models (LMs) are augmented with reasoning
skills and the ability to use tools. The former is defined as decomposing a potentially …

Metagpt: Meta programming for multi-agent collaborative framework

S Hong, X Zheng, J Chen, Y Cheng, J Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Recently, remarkable progress has been made in automated task-solving through the use of
multi-agent driven by large language models (LLMs). However, existing LLM-based multi …

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

A Pan, JS Chan, A Zou, N Li, S Basart… - International …, 2023‏ - proceedings.mlr.press
Artificial agents have traditionally been trained to maximize reward, which may incentivize
power-seeking and deception, analogous to how next-token prediction in language models …

Discovering latent knowledge in language models without supervision

C Burns, H Ye, D Klein, J Steinhardt - arxiv preprint arxiv:2212.03827, 2022‏ - arxiv.org
Existing techniques for training language models can be misaligned with the truth: if we train
models with imitation learning, they may reproduce errors that humans make; if we train …

An overview of catastrophic ai risks

D Hendrycks, M Mazeika, T Woodside - arxiv preprint arxiv:2306.12001, 2023‏ - arxiv.org
Rapid advancements in artificial intelligence (AI) have sparked growing concerns among
experts, policymakers, and world leaders regarding the potential for increasingly advanced …

Can we edit factual knowledge by in-context learning?

C Zheng, L Li, Q Dong, Y Fan, Z Wu, J Xu… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Previous studies have shown that large language models (LLMs) like GPTs store massive
factual knowledge in their parameters. However, the stored knowledge could be false or out …

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

The alignment problem from a deep learning perspective

R Ngo, L Chan, S Mindermann - arxiv preprint arxiv:2209.00626, 2022‏ - arxiv.org
In coming decades, artificial general intelligence (AGI) may surpass human capabilities at
many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to …