Foundations and recent trends in multimodal mobile agents: A survey

B Wu, Y Li, M Fang, Z Song, Z Zhang, Y Wei… - arxiv preprint arxiv …, 2024 - arxiv.org
Mobile agents are essential for automating tasks in complex and dynamic mobile
environments. As foundation models evolve, the demands for agents that can adapt in real …

Evaluating frontier models for dangerous capabilities

M Phuong, M Aitchison, E Catt, S Cogan… - arxiv preprint arxiv …, 2024 - arxiv.org
To understand the risks posed by a new AI system, we must understand what it can and
cannot do. Building on prior work, we introduce a programme of new" dangerous capability" …

Vision-language model-based human-robot collaboration for smart manufacturing: A state-of-the-art survey

J Fan, Y Yin, T Wang, W Dong, P Zheng… - Frontiers of Engineering …, 2025 - Springer
Abstract human-robot collaboration (HRC) is set to transform the manufacturing paradigm by
leveraging the strengths of human flexibility and robot precision. The recent breakthrough of …

Large language model-brained gui agents: A survey

C Zhang, S He, J Qian, B Li, L Li, S Qin, Y Kang… - arxiv preprint arxiv …, 2024 - arxiv.org
GUIs have long been central to human-computer interaction, providing an intuitive and
visually-driven way to access and interact with digital systems. The advent of LLMs …

Generalist virtual agents: A survey on autonomous agents across digital platforms

M Gao, W Bu, B Miao, Y Wu, Y Li, J Li, S Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we introduce the Generalist Virtual Agent (GVA), an autonomous entity
engineered to function across diverse digital platforms and environments, assisting users by …

Naviqate: Functionality-guided web application navigation

M Shahbandeh, P Alian, N Nashid… - arxiv preprint arxiv …, 2024 - arxiv.org
End-to-end web testing is challenging due to the need to explore diverse web application
functionalities. Current state-of-the-art methods, such as WebCanvas, are not designed for …

Gui agents: A survey

D Nguyen, J Chen, Y Wang, G Wu, N Park, Z Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have
emerged as a transformative approach to automating human-computer interaction. These …

AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants

PJ Sager, B Meyer, P Yan… - arxiv preprint arxiv …, 2025 - arxiv.org
Instruction-based computer control agents (CCAs) execute complex action sequences on
personal computers or mobile devices to fulfill tasks using the same graphical user …

[PDF][PDF] LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

W Liu, L Liu, Y Guo, H **ao, W Lin, Y Chai, S Ren… - 2025 - preprints.org
With the rapid rise of large language models (LLMs), phone automation has undergone
transformative changes. This paper systematically reviews LLM-driven phone GUI agents …

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

Y Li, B Hu, H Shi, W Wang, L Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Multimodal Models (LMMs) have achieved impressive success in visual
understanding and reasoning, remarkably improving the performance of mathematical …