Large language model-brained gui agents: A survey
GUIs have long been central to human-computer interaction, providing an intuitive and
visually-driven way to access and interact with digital systems. The advent of LLMs …
visually-driven way to access and interact with digital systems. The advent of LLMs …
Gui agents: A survey
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have
emerged as a transformative approach to automating human-computer interaction. These …
emerged as a transformative approach to automating human-computer interaction. These …
Moba: A two-level agent system for efficient mobile task automation
Current mobile assistants are limited by dependence on system APIs or struggle with
complex user instructions and diverse interfaces due to restricted comprehension and …
complex user instructions and diverse interfaces due to restricted comprehension and …
GUI Action Narrator: Where and When Did That Action Take Place?
The advent of Multimodal LLMs has significantly enhanced image OCR recognition
capabilities, making GUI automation a viable reality for increasing efficiency in digital tasks …
capabilities, making GUI automation a viable reality for increasing efficiency in digital tasks …
Grounding Multimodal Large Language Model in GUI World
GUI Screenshot - openreview.net
Recent advancements in Multimodal Large Language Models (MLLMs) have accelerated
the development of Graphical User Interface (GUI) agents capable of automating complex …
the development of Graphical User Interface (GUI) agents capable of automating complex …