Gui-world: A dataset for gui-oriented multimodal llm-based agents

D Chen, Y Huang, S Wu, J Tang, L Chen, Y Bai… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, Multimodal Large Language Models (MLLMs) have been used as agents to
control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) …

Unigen: A unified framework for textual dataset generation using large language models

S Wu, Y Huang, C Gao, D Chen, Q Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted
various fields by enabling high-quality synthetic data generation and reducing dependence …

GUI-World: A Dataset for GUI-Orientated Multimodal Large Language Models

D Chen, Y Huang, S Wu, J Tang, H Zhou, Q Zhang… - 2024 - openreview.net
Recently, Multimodal Large Language Models (MLLMs) have been used as agents to
control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) …

DataGen: Unified Synthetic Dataset Generation via Large Language Models

Y Huang, S Wu, C Gao, D Chen, Q Zhang… - … Conference on Learning … - openreview.net
Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted
various fields by enabling high-quality synthetic data generation and reducing dependence …

GUI-WORLD: A GUI-oriented Video Dataset for Multimodal LLM-based Agents

D Chen, Y Huang, S Wu, J Tang, H Zhou… - Workshop on Video … - openreview.net
Recently, Multimodal Large Language Models (MLLMs) have been used as agents to
control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) …