Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced Context-Aware System Design

Y Hu, J Tang, X Gong, Z Zhou, S Zhang… - arxiv preprint arxiv …, 2025‏ - arxiv.org
The recent surge in artificial intelligence, particularly in multimodal processing technology,
has advanced human-computer interaction, by altering how intelligent systems perceive …

Empowering smart glasses with large language models: Towards ubiquitous AGI

D Zhang, Y Li, Z He, X Li - Companion of the 2024 on ACM International …, 2024‏ - dl.acm.org
Smart glasses, augmented by advances in multimodal Large Language Models (LLMs), are
at the forefront of creating ubiquitous Artificial General Intelligence (AGI). This short literature …

NoTeeline: Supporting Real-Time, Personalized Notetaking with LLM-Enhanced Micronotes

F Huq, A Samee, DC Lin, XA Tang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Taking notes quickly while effectively capturing key information can be challenging,
especially when watching videos that present simultaneous visual and auditory streams …

[PDF][PDF] Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced Context-Aware System Design

J Tang, X Gong, Z Zhou, S Zhang, DS Elvitigala, W Hu… - 2025‏ - 3dvar.com
The recent surge in artificial intelligence, particularly in multimodal processing technology,
has advanced human-computer interaction, by altering how intelligent systems perceive …