Large language model-brained gui agents: A survey

C Zhang, S He, J Qian, B Li, L Li, S Qin, Y Kang… - arxiv preprint arxiv …, 2024 - arxiv.org
GUIs have long been central to human-computer interaction, providing an intuitive and
visually-driven way to access and interact with digital systems. The advent of LLMs …

A human-inspired reading agent with gist memory of very long contexts

KH Lee, X Chen, H Furuta, J Canny… - arxiv preprint arxiv …, 2024 - arxiv.org
Current Large Language Models (LLMs) are not only limited to some maximum context
length, but also are not able to robustly consume long inputs. To address these limitations …

Tur [k] ingbench: A challenge benchmark for web agents

K Xu, Y Kordi, T Nayak, A Asija, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Can advanced multi-modal models effectively tackle complex web-based tasks? Such tasks
are often found on crowdsourcing platforms, where crowdworkers engage in challenging …

AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants

PJ Sager, B Meyer, P Yan… - arxiv preprint arxiv …, 2025 - arxiv.org
Instruction-based computer control agents (CCAs) execute complex action sequences on
personal computers or mobile devices to fulfill tasks using the same graphical user …

Meta-task planning for language agents

C Zhang, DGX Deik, D Li, H Zhang, Y Liu - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid advancement of neural language models has sparked a new surge of intelligent
agent research. Unlike traditional agents, large language model-based agents (LLM agents) …

Gui agents: A survey

D Nguyen, J Chen, Y Wang, G Wu, N Park, Z Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have
emerged as a transformative approach to automating human-computer interaction. These …

Infrastructure for AI Agents

A Chan, K Wei, S Huang, N Rajkumar, E Perrier… - arxiv preprint arxiv …, 2025 - arxiv.org
Increasingly many AI systems can plan and execute interactions in open-ended
environments, such as making phone calls or buying online goods. As developers grow the …

Planning with Multi-Constraints via Collaborative Language Agents

C Zhang, XD Goh, D Li, H Zhang… - Proceedings of the 31st …, 2025 - aclanthology.org
The rapid advancement of neural language models has sparked a new surge of intelligent
agent research. Unlike traditional agents, large language model-based agents (LLM agents) …

Generalizing to any diverse distribution: uniformity, gentle finetuning and rebalancing

A Loukas, K Martinkus, E Wagstaff, K Cho - arxiv preprint arxiv …, 2024 - arxiv.org
As training datasets grow larger, we aspire to develop models that generalize well to any
diverse test distribution, even if the latter deviates significantly from the training data. Various …

[PDF][PDF] Os agents: A survey on mllm-based agents for general computing devices use

X Hu, T **ong, B Yi, Z Wei, R **ao, Y Chen, J Ye, M Tao… - 2024 - preprints.org
The dream to create AI assistants as capable and versatile as the fictional JARVIS from Iron
Man has long captivated imaginations. With the evolution of (multimodal) large language …