- Academic Search

G Kim, P Baldi, S McAleer - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Agents capable of carrying out general tasks on a computer can improve efficiency and
productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally …

บันทึก อ้างอิง อ้างโดย327 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Webshop: Towards scalable real-world web interaction with grounded language agents

S Yao, H Chen, J Yang… - Advances in Neural …, 2022 - proceedings.neurips.cc

Most existing benchmarks for grounding language in interactive environments either lack
realistic linguistic elements, or prove difficult to scale up due to substantial human …

บันทึก อ้างอิง อ้างโดย368 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Personal llm agents: Insights and survey about the capability, efficiency and security

Y Li, H Wen, W Wang, X Li, Y Yuan, G Liu, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …

บันทึก อ้างอิง อ้างโดย131 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Enabling conversational interaction with mobile ui using large language models

B Wang, G Li, Y Li - Proceedings of the 2023 CHI Conference on Human …, 2023 - dl.acm.org

Conversational agents show the promise to allow users to interact with mobile devices using
language. However, to perform diverse UI tasks with natural language, developers typically …

บันทึก อ้างอิง อ้างโดย177 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Understanding html with large language models

I Gur, O Nachum, Y Miao, M Safdari, A Huang… - arxiv preprint arxiv …, 2022 - arxiv.org

Large language models (LLMs) have shown exceptional performance on a variety of natural
language tasks. Yet, their capabilities for HTML understanding--ie, parsing the raw HTML of …

บันทึก อ้างอิง อ้างโดย85 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A dataset for interactive vision-language navigation with unknown command feasibility

A Burns, D Arsan, S Agrawal, R Kumar… - … on Computer Vision, 2022 - Springer

Abstract Vision-language navigation (VLN), in which an agent follows language instruction
in a visual environment, has been studied under the premise that the input command is fully …

บันทึก อ้างอิง อ้างโดย63 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Screen2vec: Semantic embedding of gui screens and gui components

TJJ Li, L Popowski, T Mitchell, BA Myers - Proceedings of the 2021 CHI …, 2021 - dl.acm.org

Representing the semantics of GUI screens and components is crucial to data-driven
computational methods for modeling user-GUI interactions and mining GUI designs. Existing …

บันทึก อ้างอิง อ้างโดย113 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Weblinx: Real-world website navigation with multi-turn dialogue

XH Lù, Z Kasner, S Reddy - arxiv preprint arxiv:2402.05930, 2024 - arxiv.org

We propose the problem of conversational web navigation, where a digital agent controls a
web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue …

บันทึก อ้างอิง อ้างโดย30 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Assistgui: Task-oriented pc graphical user interface automation

D Gao, L Ji, Z Bai, M Ouyang, P Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Graphical User Interface (GUI) automation holds significant promise for assisting
users with complex tasks thereby boosting human productivity. Existing works leveraging …

บันทึก อ้างอิง อ้างโดย6 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Meta-gui: Towards multi-modal conversational agents on mobile gui

L Sun, X Chen, L Chen, T Dai, Z Zhu, K Yu - arxiv preprint arxiv …, 2022 - arxiv.org

Task-oriented dialogue (TOD) systems have been widely used by mobile phone intelligent
assistants to accomplish tasks such as calendar scheduling or hotel reservation. Current …

บันทึก อ้างอิง อ้างโดย51 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Map** natural language commands to web elements

Language models can solve computer tasks

Webshop: Towards scalable real-world web interaction with grounded language agents

Personal llm agents: Insights and survey about the capability, efficiency and security

Enabling conversational interaction with mobile ui using large language models

Understanding html with large language models

A dataset for interactive vision-language navigation with unknown command feasibility

Screen2vec: Semantic embedding of gui screens and gui components

Weblinx: Real-world website navigation with multi-turn dialogue

Assistgui: Task-oriented pc graphical user interface automation

Meta-gui: Towards multi-modal conversational agents on mobile gui