Omniactions: Predicting digital actions in response to real-world multimodal sensory inputs with llms

JN Li, Y Xu, T Grossman, S Santosa, M Li - Proceedings of the 2024 CHI …, 2024‏ - dl.acm.org
The progression to “Pervasive Augmented Reality” envisions easy access to multimodal
information continuously. However, in many everyday scenarios, users are occupied …

Data playwright: Authoring data videos with annotated narration

L Shen, H Li, Y Wang, T Luo, Y Luo… - IEEE Transactions on …, 2024‏ - ieeexplore.ieee.org
Creating data videos that effectively narrate stories with animated visuals requires
substantial effort and expertise. A promising research trend is leveraging the easy-to-use …

PodReels: Human-AI Co-Creation of Video Podcast Teasers

S Wang, Z Ning, A Truong, M Dontcheva, D Li… - Proceedings of the …, 2024‏ - dl.acm.org
Video podcast teasers are short videos that can be shared on social media platforms to
capture interest in full episodes of a video podcast. These teasers enable long-form …

Hookpad Aria: A Copilot for Songwriters

C Donahue, SL Wu, Y Kim, D Carlton… - arxiv preprint arxiv …, 2025‏ - arxiv.org
We present Hookpad Aria, a generative AI system designed to assist musicians in writing
Western pop songs. Our system is seamlessly integrated into Hookpad, a web-based editor …

Reframe anything: Llm agent for open world video reframing

J Cao, Y Wu, W Chi, W Zhu, Z Su, J Wu - arxiv preprint arxiv:2403.06070, 2024‏ - arxiv.org
The proliferation of mobile devices and social media has revolutionized content
dissemination, with short-form video becoming increasingly prevalent. This shift has …

Enabling harmonious human-machine interaction with visual-context augmented dialogue system: A review

H Wang, B Guo, Y Zeng, M Chen, Y Ding… - ACM Transactions on …, 2022‏ - dl.acm.org
The intelligent dialogue system, aiming at communicating with humans harmoniously with
natural language, is brilliant for promoting the advancement of human-machine interaction …

Towards Intent-based User Interfaces: Charting the Design Space of Intent-AI Interactions Across Task Types

Z Ding - arxiv preprint arxiv:2404.18196, 2024‏ - arxiv.org
Technological advances continue to redefine the dynamics of human-machine interactions,
particularly in task execution. This proposal responds to the advancements in Generative AI …

Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning

TN Nguyen, K Jamale, C Gonzalez - … of the AAAI Conference on Human …, 2024‏ - ojs.aaai.org
Abstract Large Language Models (LLMs) excel in tasks from translation to complex
reasoning. For AI systems to help effectively, understanding and predicting human behavior …

Amuse: Human-AI Collaborative Songwriting with Multimodal Inspirations

Y Kim, SJ Lee, C Donahue - arxiv preprint arxiv:2412.18940, 2024‏ - arxiv.org
Songwriting is often driven by multimodal inspirations, such as imagery, narratives, or
existing music, yet songwriters remain unsupported by current music AI systems in …

Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search

H Tang, Z Zhang, Z Li, Z Zhang, X Wu, L Gao… - arxiv preprint arxiv …, 2025‏ - arxiv.org
Video Quality Assessment (VQA) is vital for large-scale video retrieval systems, aimed at
identifying quality issues to prioritize high-quality videos. In industrial systems, low-quality …