InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

Y Liu, P Li, Z Wei, C **e, X Hu, X Xu, S Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org
Graphical User Interface (GUI) Agents, powered by multimodal large language models
(MLLMs), have shown great potential for task automation on computing devices such as …

Harnessing Large Language Model for Virtual Reality Exploration Testing: A Case Study

Z Qi, H Li, H Qin, K Peng, S He, X Qin - arxiv preprint arxiv:2501.05625, 2025 - arxiv.org
As the Virtual Reality (VR) industry expands, the need for automated GUI testing is growing
rapidly. Large Language Models (LLMs), capable of retaining information long-term and …

Image, Text, and Speech Data Augmentation using Multimodal LLMs for Deep Learning: A Survey

R Sapkota, S Raza, M Shoman, A Paudel… - arxiv preprint arxiv …, 2025 - arxiv.org
In the past five years, research has shifted from traditional Machine Learning (ML) and Deep
Learning (DL) approaches to leveraging Large Language Models (LLMs), including …

Prompt2Task: Automating UI Tasks on Smartphones from Textual Prompts

T Huang, C Yu, W Shi, Z Peng, D Yang, W Sun… - ACM Transactions on … - dl.acm.org
UI task automation enables efficient task execution by simulating human interactions with
graphical user interfaces (GUIs), without modifying the existing application code. However …