Augmented language models: a survey

G Mialon, R Dessì, M Lomeli, C Nalmpantis… - arxiv preprint arxiv …, 2023 - arxiv.org
This survey reviews works in which language models (LMs) are augmented with reasoning
skills and the ability to use tools. The former is defined as decomposing a potentially …

Webarena: A realistic web environment for building autonomous agents

S Zhou, FF Xu, H Zhu, X Zhou, R Lo, A Sridhar… - arxiv preprint arxiv …, 2023 - arxiv.org
With generative AI advances, the exciting potential for autonomous agents to manage daily
tasks via natural language commands has emerged. However, cur rent agents are primarily …

Agentbench: Evaluating llms as agents

X Liu, H Yu, H Zhang, Y Xu, X Lei, H Lai, Y Gu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) are becoming increasingly smart and autonomous,
targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has …

Webshop: Towards scalable real-world web interaction with grounded language agents

S Yao, H Chen, J Yang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Most existing benchmarks for grounding language in interactive environments either lack
realistic linguistic elements, or prove difficult to scale up due to substantial human …

Autonomous evaluation and refinement of digital agents

J Pan, Y Zhang, N Tomlin, Y Zhou, S Levine… - arxiv preprint arxiv …, 2024 - arxiv.org
We show that domain-general automatic evaluators can significantly improve the
performance of agents for web navigation and device control. We experiment with multiple …

A data-driven approach for learning to control computers

PC Humphreys, D Raposo, T Pohlen… - International …, 2022 - proceedings.mlr.press
It would be useful for machines to use computers as humans do so that they can aid us in
everyday tasks. This is a setting in which there is also the potential to leverage large-scale …

Understanding html with large language models

I Gur, O Nachum, Y Miao, M Safdari, A Huang… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models (LLMs) have shown exceptional performance on a variety of natural
language tasks. Yet, their capabilities for HTML understanding--ie, parsing the raw HTML of …

Personal llm agents: Insights and survey about the capability, efficiency and security

Y Li, H Wen, W Wang, X Li, Y Yuan, G Liu, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …

Understanding the weakness of large language model agents within a complex android environment

M **ng, R Zhang, H Xue, Q Chen, F Yang… - Proceedings of the 30th …, 2024 - dl.acm.org
Large language models (LLMs) have empowered intelligent agents to execute intricate
tasks within domain-specific software such as browsers and games. However, when applied …

Digirl: Training in-the-wild device-control agents with autonomous reinforcement learning

H Bai, Y Zhou, M Cemri, J Pan, A Suhr, S Levine… - arxiv preprint arxiv …, 2024 - arxiv.org
Training corpuses for vision language models (VLMs) typically lack sufficient amounts of
decision-centric data. This renders off-the-shelf VLMs sub-optimal for decision-making tasks …