Google Académico

Gptvoicetasker: Advancing multi-step mobile task efficiency through dynamic interface exploration...

Y Liu, P Li, Z Wei, C **e, X Hu, X Xu, S Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org

Graphical User Interface (GUI) Agents, powered by multimodal large language models
(MLLMs), have shown great potential for task automation on computing devices such as …

Guardar Citar Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Harnessing Large Language Model for Virtual Reality Exploration Testing: A Case Study

Z Qi, H Li, H Qin, K Peng, S He, X Qin - arxiv preprint arxiv:2501.05625, 2025 - arxiv.org

As the Virtual Reality (VR) industry expands, the need for automated GUI testing is growing
rapidly. Large Language Models (LLMs), capable of retaining information long-term and …

Guardar Citar Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Image, Text, and Speech Data Augmentation using Multimodal LLMs for Deep Learning: A Survey

R Sapkota, S Raza, M Shoman, A Paudel… - arxiv preprint arxiv …, 2025 - arxiv.org

In the past five years, research has shifted from traditional Machine Learning (ML) and Deep
Learning (DL) approaches to leveraging Large Language Models (LLMs), including …

Guardar Citar Artículos relacionados Versión en HTML

Prompt2Task: Automating UI Tasks on Smartphones from Textual Prompts

T Huang, C Yu, W Shi, Z Peng, D Yang, W Sun… - ACM Transactions on … - dl.acm.org

UI task automation enables efficient task execution by simulating human interactions with
graphical user interfaces (GUIs), without modifying the existing application code. However …

Guardar Citar Artículos relacionados

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Gptvoicetasker: Advancing multi-step mobile task efficiency through dynamic interface exploration...

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

Harnessing Large Language Model for Virtual Reality Exploration Testing: A Case Study

Image, Text, and Speech Data Augmentation using Multimodal LLMs for Deep Learning: A Survey

Prompt2Task: Automating UI Tasks on Smartphones from Textual Prompts