The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions

PS Park, S Goldstein, A O'Gara, M Chen, D Hendrycks - Patterns, 2024 - cell.com
This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …

Avalon's game of thoughts: Battle against deception through recursive contemplation

S Wang, C Liu, Z Zheng, S Qi, S Chen, Q Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent breakthroughs in large language models (LLMs) have brought remarkable success
in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information …

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

J Huang, EJ Li, MH Lam, T Liang, W Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Decision-making is a complex process requiring diverse abilities, making it an excellent
framework for evaluating Large Language Models (LLMs). Researchers have examined …

Llm as a mastermind: A survey of strategic reasoning with large language models

Y Zhang, S Mao, T Ge, X Wang, A de Wynter… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper presents a comprehensive survey of the current status and opportunities for
Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning …

Evaluating frontier models for dangerous capabilities

M Phuong, M Aitchison, E Catt, S Cogan… - arxiv preprint arxiv …, 2024 - arxiv.org
To understand the risks posed by a new AI system, we must understand what it can and
cannot do. Building on prior work, we introduce a programme of new" dangerous capability" …

Put your money where your mouth is: Evaluating strategic planning and execution of llm agents in an auction arena

J Chen, S Yuan, R Ye, BP Majumder… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advancements in Large Language Models (LLMs) showcase advanced reasoning,
yet NLP evaluations often depend on static benchmarks. Evaluating this necessitates …

Critical thinking in the age of generative AI

BZ Larson, C Moser, A Caza, K Muehlfeld… - … Learning & Education, 2024 - journals.aom.org
The rapid rise of generative artificial intelligence (GenAI) has prompted a vigorous
discussion about the role this technology should play in the business classroom (Adeshola …

Large language models can strategically deceive their users when put under pressure

J Scheurer, M Balesni, M Hobbhahn - arxiv preprint arxiv:2311.07590, 2023 - arxiv.org
We demonstrate a situation in which Large Language Models, trained to be helpful,
harmless, and honest, can display misaligned behavior and strategically deceive their users …

Agentlens: Visual analysis for agent behaviors in llm-based autonomous systems

J Lu, B Pan, J Chen, Y Feng, J Hu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Recently, Large Language Model based Autonomous System (LLMAS) has gained great
popularity for its potential to simulate complicated behaviors of human societies. One of its …