Agent-pro: Learning to evolve via policy-level reflection and optimization

W Zhang, K Tang, H Wu, M Wang, Y Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models exhibit robust problem-solving capabilities for diverse tasks.
However, most LLM-based agents are designed as specific task solvers with sophisticated …

Robogolf: Mastering real-world minigolf with a reflective multi-modality vision-language model

H Zhou, T Ji, L Sommerhalder, M Goerner… - arxiv preprint arxiv …, 2024 - arxiv.org
Minigolf is an exemplary real-world game for examining embodied intelligence, requiring
challenging spatial and kinodynamic understanding to putt the ball. Additionally, reflective …

Clickagent: Enhancing ui location capabilities of autonomous agents

J Hoscilowicz, B Maj, B Kozakiewicz… - arxiv preprint arxiv …, 2024 - arxiv.org
With the growing reliance on digital devices equipped with graphical user interfaces (GUIs),
such as computers and smartphones, the need for effective automation tools has become …

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Z Wang, S Cai, Z Mu, H Lin, C Zhang, X Liu, Q Li… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper presents OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-
world instruction-following agents in Minecraft. Compared to prior works that either emit …

Autonomous Mental Development at the Individual and Collective Levels: Concept and Challenges

M Lippi, S Mariani, M Martinelli, F Zambonelli - IEEE Access, 2024 - ieeexplore.ieee.org
The increasing complexity and unpredictability of many ICT scenarios let us envision that
future systems will have to dynamically learn how to act and adapt to face evolving situations …

A taxonomy of architecture options for foundation model-based agents: Analysis and decision model

J Zhou, Q Lu, J Chen, L Zhu, X Xu, Z **ng… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid advancement of AI technology has led to widespread applications of agent
systems across various domains. However, the need for detailed architecture design poses …

AI to publish knowledge: a tectonic shift

T Lemberger - EMBO reports, 2024 - embopress.org
The rise of generative AI will transform scientific publishing but it also poses risks. While AI
enables the dissemination of knowledge in computable form, preserving transparency and …

Position: Foundation Agents as the Paradigm Shift for Decision Making

X Liu, X Lou, J Jiao, J Zhang - arxiv preprint arxiv:2405.17009, 2024 - arxiv.org
Decision making demands intricate interplay between perception, memory, and reasoning to
discern optimal policies. Conventional approaches to decision making face challenges …

Smart Mobility with Agent-based Foundation Models: Towards Interactive and Collaborative Intelligent Vehicles

B **a, P **e, J Wang - IEEE Transactions on Intelligent Vehicles, 2024 - ieeexplore.ieee.org
This letter reports the insights gained during a Distributed/Decentralized Hybrid Workshop
on Foundation/Infrastructure Intelligence (FII), where we discussed the evolving role of …

Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models

KG Barman, S Caron, E Sullivan, HW de Regt… - arxiv preprint arxiv …, 2025 - arxiv.org
This paper explores ideas and provides a potential roadmap for the development and
evaluation of physics-specific large-scale AI models, which we call Large Physics Models …