Webshop: Towards scalable real-world web interaction with grounded language agents

S Yao, H Chen, J Yang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Most existing benchmarks for grounding language in interactive environments either lack
realistic linguistic elements, or prove difficult to scale up due to substantial human …

From pixels to ui actions: Learning to follow instructions via graphical user interfaces

P Shaw, M Joshi, J Cohan, J Berant… - Advances in …, 2023 - proceedings.neurips.cc
Much of the previous work towards digital agents for graphical user interfaces (GUIs) has
relied on text-based representations (derived from HTML or other structured data sources) …

Understanding html with large language models

I Gur, O Nachum, Y Miao, M Safdari, A Huang… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models (LLMs) have shown exceptional performance on a variety of natural
language tasks. Yet, their capabilities for HTML understanding--ie, parsing the raw HTML of …

Webui: A dataset for enhancing visual ui understanding with web semantics

J Wu, S Wang, S Shen, YH Peng, J Nichols… - Proceedings of the …, 2023 - dl.acm.org
Modeling user interfaces (UIs) from visual information allows systems to make inferences
about the functionality and semantics needed to support use cases in accessibility, app …

Unblind text inputs: predicting hint-text of text input in mobile apps via LLM

Z Liu, C Chen, J Wang, M Chen, B Wu… - Proceedings of the …, 2024 - dl.acm.org
Mobile apps have become indispensable for accessing and participating in various
environments, especially for low-vision users. Users with visual impairments can use screen …

Heap: Hierarchical policies for web actions using llms

P Sodhi, SRK Branavan, R McDonald - 2023 - openreview.net
Large language models (LLMs) have demonstrated remarkable capabilities in performing a
range of instruction-following tasks in few and zero-shot settings. However, teaching LLMs to …

Visual grounding for desktop graphical user interfaces

T Dardouri, L Minkova, JL Espejel, W Dahhane… - arxiv preprint arxiv …, 2024 - arxiv.org
Most instance perception and image understanding solutions focus mainly on natural
images. However, applications for synthetic images, and more specifically, images of …

Language Agents: From Next-Token Prediction to Digital Automation

S Yao - 2024 - search.proquest.com
Building autonomous agents to interact with the world lies at the core of artificial intelligence
(AI). This thesis introduces" language agents'', a new category of agents that utilize large …

Learning Language through Interactions with the Digital World

JB Yang - 2023 - search.proquest.com
A noteworthy omission in the development process of common NLP models is the lack of
interactive components. While common downstream applications of large language models …