Language models can solve computer tasks

G Kim, P Baldi, S McAleer - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Agents capable of carrying out general tasks on a computer can improve efficiency and
productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally …

Webshop: Towards scalable real-world web interaction with grounded language agents

S Yao, H Chen, J Yang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Most existing benchmarks for grounding language in interactive environments either lack
realistic linguistic elements, or prove difficult to scale up due to substantial human …

Personal llm agents: Insights and survey about the capability, efficiency and security

Y Li, H Wen, W Wang, X Li, Y Yuan, G Liu, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …

Enabling conversational interaction with mobile ui using large language models

B Wang, G Li, Y Li - Proceedings of the 2023 CHI Conference on Human …, 2023 - dl.acm.org
Conversational agents show the promise to allow users to interact with mobile devices using
language. However, to perform diverse UI tasks with natural language, developers typically …

Understanding html with large language models

I Gur, O Nachum, Y Miao, M Safdari, A Huang… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models (LLMs) have shown exceptional performance on a variety of natural
language tasks. Yet, their capabilities for HTML understanding--ie, parsing the raw HTML of …

A dataset for interactive vision-language navigation with unknown command feasibility

A Burns, D Arsan, S Agrawal, R Kumar… - … on Computer Vision, 2022 - Springer
Abstract Vision-language navigation (VLN), in which an agent follows language instruction
in a visual environment, has been studied under the premise that the input command is fully …

Screen2vec: Semantic embedding of gui screens and gui components

TJJ Li, L Popowski, T Mitchell, BA Myers - Proceedings of the 2021 CHI …, 2021 - dl.acm.org
Representing the semantics of GUI screens and components is crucial to data-driven
computational methods for modeling user-GUI interactions and mining GUI designs. Existing …

Weblinx: Real-world website navigation with multi-turn dialogue

XH Lù, Z Kasner, S Reddy - arxiv preprint arxiv:2402.05930, 2024 - arxiv.org
We propose the problem of conversational web navigation, where a digital agent controls a
web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue …

Assistgui: Task-oriented pc graphical user interface automation

D Gao, L Ji, Z Bai, M Ouyang, P Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Graphical User Interface (GUI) automation holds significant promise for assisting
users with complex tasks thereby boosting human productivity. Existing works leveraging …

Meta-gui: Towards multi-modal conversational agents on mobile gui

L Sun, X Chen, L Chen, T Dai, Z Zhu, K Yu - arxiv preprint arxiv …, 2022 - arxiv.org
Task-oriented dialogue (TOD) systems have been widely used by mobile phone intelligent
assistants to accomplish tasks such as calendar scheduling or hotel reservation. Current …