The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey

T Masterman, S Besen, M Sawtell, A Chao - arxiv preprint arxiv …, 2024 - arxiv.org
This survey paper examines the recent advancements in AI agent implementations, with a
focus on their ability to achieve complex goals that require enhanced reasoning, planning …

Can large language models explore in-context?

A Krishnamurthy, K Harris, DJ Foster… - Advances in …, 2025 - proceedings.neurips.cc
We investigate the extent to which contemporary Large Language Models (LLMs) can
engage in exploration, a core capability in reinforcement learning and decision making. We …

Large multimodal agents: A survey

J **e, Z Chen, R Zhang, X Wan, G Li - arxiv preprint arxiv:2402.15116, 2024 - arxiv.org
Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

J Huang, EJ Li, MH Lam, T Liang, W Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Decision-making is a complex process requiring diverse abilities, making it an excellent
framework for evaluating Large Language Models (LLMs). Researchers have examined …

Agent-pro: Learning to evolve via policy-level reflection and optimization

W Zhang, K Tang, H Wu, M Wang, Y Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse
tasks. However, most LLM-based agents are designed as specific task solvers with …

Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models

S Sicari, JF Cevallos M, A Rizzardi… - ACM Computing …, 2024 - dl.acm.org
This survey summarises the most recent methods for building and assessing helpful, honest,
and harmless neural language models, considering small, medium, and large-size models …

Cybench: A framework for evaluating cybersecurity capabilities and risks of language models

AK Zhang, N Perry, R Dulepet, J Ji, C Menders… - arxiv preprint arxiv …, 2024 - arxiv.org
Language Model (LM) agents for cybersecurity that are capable of autonomously identifying
vulnerabilities and executing exploits have potential to cause real-world impact …

Magic: Investigation of large language model powered multi-agent in cognition, adaptability, rationality and collaboration

L Xu, Z Hu, D Zhou, H Ren, Z Dong, K Keutzer… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have marked a significant advancement in the field of
natural language processing, demonstrating exceptional capabilities in reasoning, tool …

Understanding the weakness of large language model agents within a complex android environment

M **ng, R Zhang, H Xue, Q Chen, F Yang… - Proceedings of the 30th …, 2024 - dl.acm.org
Large language models (LLMs) have empowered intelligent agents to execute intricate
tasks within domain-specific software such as browsers and games. However, when applied …

Llmarena: Assessing capabilities of large language models in dynamic multi-agent environments

J Chen, X Hu, S Liu, S Huang, WW Tu, Z He… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have revealed their potential for
achieving autonomous agents possessing human-level intelligence. However, existing …