- Academic Search

B Wu, Y Li, M Fang, Z Song, Z Zhang, Y Wei… - arxiv preprint arxiv …, 2024 - arxiv.org

Mobile agents are essential for automating tasks in complex and dynamic mobile
environments. As foundation models evolve, the demands for agents that can adapt in real …

Spara Citera Citerat av 3 Relaterade artiklar Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

Evaluating frontier models for dangerous capabilities

M Phuong, M Aitchison, E Catt, S Cogan… - arxiv preprint arxiv …, 2024 - arxiv.org

To understand the risks posed by a new AI system, we must understand what it can and
cannot do. Building on prior work, we introduce a programme of new" dangerous capability" …

Spara Citera Citerat av 45 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]

[PDF] springer.com

Vision-language model-based human-robot collaboration for smart manufacturing: A state-of-the-art survey

J Fan, Y Yin, T Wang, W Dong, P Zheng… - Frontiers of Engineering …, 2025 - Springer

Abstract human-robot collaboration (HRC) is set to transform the manufacturing paradigm by
leveraging the strengths of human flexibility and robot precision. The recent breakthrough of …

Spara Citera Relaterade artiklar Alla 2 versionerna

[Free GPT-4]

[PDF] arxiv.org

Large language model-brained gui agents: A survey

C Zhang, S He, J Qian, B Li, L Li, S Qin, Y Kang… - arxiv preprint arxiv …, 2024 - arxiv.org

GUIs have long been central to human-computer interaction, providing an intuitive and
visually-driven way to access and interact with digital systems. The advent of LLMs …

Spara Citera Citerat av 5 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

Generalist virtual agents: A survey on autonomous agents across digital platforms

M Gao, W Bu, B Miao, Y Wu, Y Li, J Li, S Tang… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we introduce the Generalist Virtual Agent (GVA), an autonomous entity
engineered to function across diverse digital platforms and environments, assisting users by …

Spara Citera Citerat av 3 Relaterade artiklar Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

Naviqate: Functionality-guided web application navigation

M Shahbandeh, P Alian, N Nashid… - arxiv preprint arxiv …, 2024 - arxiv.org

End-to-end web testing is challenging due to the need to explore diverse web application
functionalities. Current state-of-the-art methods, such as WebCanvas, are not designed for …

Spara Citera Citerat av 3 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

Gui agents: A survey

D Nguyen, J Chen, Y Wang, G Wu, N Park, Z Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Graphical User Interface (GUI) agents, powered by Large Foundation Models, have
emerged as a transformative approach to automating human-computer interaction. These …

Spara Citera Citerat av 1 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants

PJ Sager, B Meyer, P Yan… - arxiv preprint arxiv …, 2025 - arxiv.org

Instruction-based computer control agents (CCAs) execute complex action sequences on
personal computers or mobile devices to fulfill tasks using the same graphical user …

Spara Citera Relaterade artiklar Se som HTML-version

[Free GPT-4]

[PDF] preprints.org

[PDF][PDF] LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

W Liu, L Liu, Y Guo, H **ao, W Lin, Y Chai, S Ren… - 2025 - preprints.org

With the rapid rise of large language models (LLMs), phone automation has undergone
transformative changes. This paper systematically reviews LLM-driven phone GUI agents …

Spara Citera Citerat av 1 Relaterade artiklar Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

Y Li, B Hu, H Shi, W Wang, L Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Multimodal Models (LMMs) have achieved impressive success in visual
understanding and reasoning, remarkably improving the performance of mathematical …

Spara Citera Citerat av 15 Relaterade artiklar Alla 4 versionerna Se som HTML-version

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Webvln: Vision-and-language navigation on websites

Foundations and recent trends in multimodal mobile agents: A survey

Evaluating frontier models for dangerous capabilities

Vision-language model-based human-robot collaboration for smart manufacturing: A state-of-the-art survey

Large language model-brained gui agents: A survey

Generalist virtual agents: A survey on autonomous agents across digital platforms

Naviqate: Functionality-guided web application navigation

Gui agents: A survey

AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants

[PDF][PDF] LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context