Tool learning with foundation models

Y Qin, S Hu, Y Lin, W Chen, N Ding, G Cui… - ACM Computing …, 2024 - dl.acm.org
Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …

Mind2web: Towards a generalist agent for the web

X Deng, Y Gu, B Zheng, S Chen… - Advances in …, 2023 - proceedings.neurips.cc
Abstract We introduce Mind2Web, the first dataset for develo** and evaluating generalist
agents for the web that can follow language instructions to complete complex tasks on any …

Language models can solve computer tasks

G Kim, P Baldi, S McAleer - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Agents capable of carrying out general tasks on a computer can improve efficiency and
productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally …

Webarena: A realistic web environment for building autonomous agents

S Zhou, FF Xu, H Zhu, X Zhou, R Lo, A Sridhar… - arxiv preprint arxiv …, 2023 - arxiv.org
With advances in generative AI, there is now potential for autonomous agents to manage
daily tasks via natural language commands. However, current agents are primarily created …

Androidinthewild: A large-scale dataset for android device control

C Rawles, A Li, D Rodriguez… - Advances in Neural …, 2023 - proceedings.neurips.cc
There is a growing interest in device-control systems that can interpret human natural
language instructions and execute them on a digital device by directly controlling its user …

Autogen: Enabling next-gen llm applications via multi-agent conversation

Q Wu, G Bansal, J Zhang, Y Wu, B Li, E Zhu… - arxiv preprint arxiv …, 2023 - arxiv.org
AutoGen is an open-source framework that allows developers to build LLM applications via
multiple agents that can converse with each other to accomplish tasks. AutoGen agents are …

Personal llm agents: Insights and survey about the capability, efficiency and security

Y Li, H Wen, W Wang, X Li, Y Yuan, G Liu, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …

Gpt-4v (ision) is a generalist web agent, if grounded

B Zheng, B Gou, J Kil, H Sun, Y Su - arxiv preprint arxiv:2401.01614, 2024 - arxiv.org
The recent development on large multimodal models (LMMs), especially GPT-4V (ision) and
Gemini, has been quickly expanding the capability boundaries of multimodal models …

Webshop: Towards scalable real-world web interaction with grounded language agents

S Yao, H Chen, J Yang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Most existing benchmarks for grounding language in interactive environments either lack
realistic linguistic elements, or prove difficult to scale up due to substantial human …

Language agent tree search unifies reasoning acting and planning in language models

A Zhou, K Yan, M Shlapentokh-Rothman… - arxiv preprint arxiv …, 2023 - arxiv.org
While language models (LMs) have shown potential across a range of decision-making
tasks, their reliance on simple acting processes limits their broad deployment as …