InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Graphical User Interface (GUI) Agents, powered by multimodal large language models
(MLLMs), have shown great potential for task automation on computing devices such as …
(MLLMs), have shown great potential for task automation on computing devices such as …
Harnessing Large Language Model for Virtual Reality Exploration Testing: A Case Study
As the Virtual Reality (VR) industry expands, the need for automated GUI testing is growing
rapidly. Large Language Models (LLMs), capable of retaining information long-term and …
rapidly. Large Language Models (LLMs), capable of retaining information long-term and …
Image, Text, and Speech Data Augmentation using Multimodal LLMs for Deep Learning: A Survey
In the past five years, research has shifted from traditional Machine Learning (ML) and Deep
Learning (DL) approaches to leveraging Large Language Models (LLMs), including …
Learning (DL) approaches to leveraging Large Language Models (LLMs), including …
Prompt2Task: Automating UI Tasks on Smartphones from Textual Prompts
UI task automation enables efficient task execution by simulating human interactions with
graphical user interfaces (GUIs), without modifying the existing application code. However …
graphical user interfaces (GUIs), without modifying the existing application code. However …