Large language model-brained gui agents: A survey

C Zhang, S He, J Qian, B Li, L Li, S Qin, Y Kang… - arxiv preprint arxiv …, 2024 - arxiv.org
GUIs have long been central to human-computer interaction, providing an intuitive and
visually-driven way to access and interact with digital systems. The advent of LLMs …

Gui agents: A survey

D Nguyen, J Chen, Y Wang, G Wu, N Park, Z Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have
emerged as a transformative approach to automating human-computer interaction. These …

Moba: A two-level agent system for efficient mobile task automation

Z Zhu, H Tang, Y Li, K Lan, Y Jiang, H Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Current mobile assistants are limited by dependence on system APIs or struggle with
complex user instructions and diverse interfaces due to restricted comprehension and …

GUI Action Narrator: Where and When Did That Action Take Place?

Q Wu, D Gao, KQ Lin, Z Wu, X Guo, P Li… - arxiv preprint arxiv …, 2024 - arxiv.org
The advent of Multimodal LLMs has significantly enhanced image OCR recognition
capabilities, making GUI automation a viable reality for increasing efficiency in digital tasks …

Grounding Multimodal Large Language Model in GUI World

GUI Screenshot - openreview.net
Recent advancements in Multimodal Large Language Models (MLLMs) have accelerated
the development of Graphical User Interface (GUI) agents capable of automating complex …