Foundations and recent trends in multimodal mobile agents: A survey
Mobile agents are essential for automating tasks in complex and dynamic mobile
environments. As foundation models evolve, the demands for agents that can adapt in real …
environments. As foundation models evolve, the demands for agents that can adapt in real …
Evaluating frontier models for dangerous capabilities
To understand the risks posed by a new AI system, we must understand what it can and
cannot do. Building on prior work, we introduce a programme of new" dangerous capability" …
cannot do. Building on prior work, we introduce a programme of new" dangerous capability" …
Vision-language model-based human-robot collaboration for smart manufacturing: A state-of-the-art survey
Abstract human-robot collaboration (HRC) is set to transform the manufacturing paradigm by
leveraging the strengths of human flexibility and robot precision. The recent breakthrough of …
leveraging the strengths of human flexibility and robot precision. The recent breakthrough of …
Large language model-brained gui agents: A survey
GUIs have long been central to human-computer interaction, providing an intuitive and
visually-driven way to access and interact with digital systems. The advent of LLMs …
visually-driven way to access and interact with digital systems. The advent of LLMs …
Generalist virtual agents: A survey on autonomous agents across digital platforms
In this paper, we introduce the Generalist Virtual Agent (GVA), an autonomous entity
engineered to function across diverse digital platforms and environments, assisting users by …
engineered to function across diverse digital platforms and environments, assisting users by …
Naviqate: Functionality-guided web application navigation
End-to-end web testing is challenging due to the need to explore diverse web application
functionalities. Current state-of-the-art methods, such as WebCanvas, are not designed for …
functionalities. Current state-of-the-art methods, such as WebCanvas, are not designed for …
Gui agents: A survey
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have
emerged as a transformative approach to automating human-computer interaction. These …
emerged as a transformative approach to automating human-computer interaction. These …
AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants
Instruction-based computer control agents (CCAs) execute complex action sequences on
personal computers or mobile devices to fulfill tasks using the same graphical user …
personal computers or mobile devices to fulfill tasks using the same graphical user …
[PDF][PDF] LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects
With the rapid rise of large language models (LLMs), phone automation has undergone
transformative changes. This paper systematically reviews LLM-driven phone GUI agents …
transformative changes. This paper systematically reviews LLM-driven phone GUI agents …
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
Large Multimodal Models (LMMs) have achieved impressive success in visual
understanding and reasoning, remarkably improving the performance of mathematical …
understanding and reasoning, remarkably improving the performance of mathematical …