- Academic Search

G Mialon, R Dessì, M Lomeli, C Nalmpantis… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey reviews works in which language models (LMs) are augmented with reasoning
skills and the ability to use tools. The former is defined as decomposing a potentially …

保存引用被引用次数：504 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Webarena: A realistic web environment for building autonomous agents

S Zhou, FF Xu, H Zhu, X Zhou, R Lo, A Sridhar… - arxiv preprint arxiv …, 2023 - arxiv.org

With generative AI advances, the exciting potential for autonomous agents to manage daily
tasks via natural language commands has emerged. However, cur rent agents are primarily …

保存引用被引用次数：256 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] openreview.net

Agentbench: Evaluating llms as agents

X Liu, H Yu, H Zhang, Y Xu, X Lei, H Lai, Y Gu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) are becoming increasingly smart and autonomous,
targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has …

保存引用被引用次数：255 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] neurips.cc

Webshop: Towards scalable real-world web interaction with grounded language agents

S Yao, H Chen, J Yang… - Advances in Neural …, 2022 - proceedings.neurips.cc

Most existing benchmarks for grounding language in interactive environments either lack
realistic linguistic elements, or prove difficult to scale up due to substantial human …

保存引用被引用次数：342 相关文章所有 7 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Autonomous evaluation and refinement of digital agents

J Pan, Y Zhang, N Tomlin, Y Zhou, S Levine… - arxiv preprint arxiv …, 2024 - arxiv.org

We show that domain-general automatic evaluators can significantly improve the
performance of agents for web navigation and device control. We experiment with multiple …

保存引用被引用次数：36 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] mlr.press

A data-driven approach for learning to control computers

PC Humphreys, D Raposo, T Pohlen… - International …, 2022 - proceedings.mlr.press

It would be useful for machines to use computers as humans do so that they can aid us in
everyday tasks. This is a setting in which there is also the potential to leverage large-scale …

保存引用被引用次数：107 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Understanding html with large language models

I Gur, O Nachum, Y Miao, M Safdari, A Huang… - arxiv preprint arxiv …, 2022 - arxiv.org

Large language models (LLMs) have shown exceptional performance on a variety of natural
language tasks. Yet, their capabilities for HTML understanding--ie, parsing the raw HTML of …

保存引用被引用次数：87 相关文章所有 5 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Personal llm agents: Insights and survey about the capability, efficiency and security

Y Li, H Wen, W Wang, X Li, Y Yuan, G Liu, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …

保存引用被引用次数：116 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Understanding the weakness of large language model agents within a complex android environment

M **ng, R Zhang, H Xue, Q Chen, F Yang… - Proceedings of the 30th …, 2024 - dl.acm.org

Large language models (LLMs) have empowered intelligent agents to execute intricate
tasks within domain-specific software such as browsers and games. However, when applied …

保存引用被引用次数：20 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

Digirl: Training in-the-wild device-control agents with autonomous reinforcement learning

H Bai, Y Zhou, M Cemri, J Pan, A Suhr, S Levine… - arxiv preprint arxiv …, 2024 - arxiv.org

Training corpuses for vision language models (VLMs) typically lack sufficient amounts of
decision-centric data. This renders off-the-shelf VLMs sub-optimal for decision-making tasks …

保存引用被引用次数：20 相关文章 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Androidenv: A reinforcement learning platform for android

Augmented language models: a survey

Webarena: A realistic web environment for building autonomous agents

Agentbench: Evaluating llms as agents

Webshop: Towards scalable real-world web interaction with grounded language agents

Autonomous evaluation and refinement of digital agents

A data-driven approach for learning to control computers

Understanding html with large language models

Personal llm agents: Insights and survey about the capability, efficiency and security

Understanding the weakness of large language model agents within a complex android environment

Digirl: Training in-the-wild device-control agents with autonomous reinforcement learning