Augmented language models: a survey
This survey reviews works in which language models (LMs) are augmented with reasoning
skills and the ability to use tools. The former is defined as decomposing a potentially …
skills and the ability to use tools. The former is defined as decomposing a potentially …
Webarena: A realistic web environment for building autonomous agents
With generative AI advances, the exciting potential for autonomous agents to manage daily
tasks via natural language commands has emerged. However, cur rent agents are primarily …
tasks via natural language commands has emerged. However, cur rent agents are primarily …
Agentbench: Evaluating llms as agents
Large Language Models (LLMs) are becoming increasingly smart and autonomous,
targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has …
targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has …
Webshop: Towards scalable real-world web interaction with grounded language agents
Most existing benchmarks for grounding language in interactive environments either lack
realistic linguistic elements, or prove difficult to scale up due to substantial human …
realistic linguistic elements, or prove difficult to scale up due to substantial human …
Autonomous evaluation and refinement of digital agents
We show that domain-general automatic evaluators can significantly improve the
performance of agents for web navigation and device control. We experiment with multiple …
performance of agents for web navigation and device control. We experiment with multiple …
A data-driven approach for learning to control computers
It would be useful for machines to use computers as humans do so that they can aid us in
everyday tasks. This is a setting in which there is also the potential to leverage large-scale …
everyday tasks. This is a setting in which there is also the potential to leverage large-scale …
Understanding html with large language models
Large language models (LLMs) have shown exceptional performance on a variety of natural
language tasks. Yet, their capabilities for HTML understanding--ie, parsing the raw HTML of …
language tasks. Yet, their capabilities for HTML understanding--ie, parsing the raw HTML of …
Personal llm agents: Insights and survey about the capability, efficiency and security
Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …
been one of the key technologies that researchers and engineers have focused on, aiming …
Understanding the weakness of large language model agents within a complex android environment
Large language models (LLMs) have empowered intelligent agents to execute intricate
tasks within domain-specific software such as browsers and games. However, when applied …
tasks within domain-specific software such as browsers and games. However, when applied …
Digirl: Training in-the-wild device-control agents with autonomous reinforcement learning
Training corpuses for vision language models (VLMs) typically lack sufficient amounts of
decision-centric data. This renders off-the-shelf VLMs sub-optimal for decision-making tasks …
decision-centric data. This renders off-the-shelf VLMs sub-optimal for decision-making tasks …