Omniactions: Predicting digital actions in response to real-world multimodal sensory inputs with llms
The progression to “Pervasive Augmented Reality” envisions easy access to multimodal
information continuously. However, in many everyday scenarios, users are occupied …
information continuously. However, in many everyday scenarios, users are occupied …
Data playwright: Authoring data videos with annotated narration
Creating data videos that effectively narrate stories with animated visuals requires
substantial effort and expertise. A promising research trend is leveraging the easy-to-use …
substantial effort and expertise. A promising research trend is leveraging the easy-to-use …
PodReels: Human-AI Co-Creation of Video Podcast Teasers
Video podcast teasers are short videos that can be shared on social media platforms to
capture interest in full episodes of a video podcast. These teasers enable long-form …
capture interest in full episodes of a video podcast. These teasers enable long-form …
Hookpad Aria: A Copilot for Songwriters
We present Hookpad Aria, a generative AI system designed to assist musicians in writing
Western pop songs. Our system is seamlessly integrated into Hookpad, a web-based editor …
Western pop songs. Our system is seamlessly integrated into Hookpad, a web-based editor …
Reframe anything: Llm agent for open world video reframing
The proliferation of mobile devices and social media has revolutionized content
dissemination, with short-form video becoming increasingly prevalent. This shift has …
dissemination, with short-form video becoming increasingly prevalent. This shift has …
Enabling harmonious human-machine interaction with visual-context augmented dialogue system: A review
The intelligent dialogue system, aiming at communicating with humans harmoniously with
natural language, is brilliant for promoting the advancement of human-machine interaction …
natural language, is brilliant for promoting the advancement of human-machine interaction …
Towards Intent-based User Interfaces: Charting the Design Space of Intent-AI Interactions Across Task Types
Technological advances continue to redefine the dynamics of human-machine interactions,
particularly in task execution. This proposal responds to the advancements in Generative AI …
particularly in task execution. This proposal responds to the advancements in Generative AI …
Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning
Abstract Large Language Models (LLMs) excel in tasks from translation to complex
reasoning. For AI systems to help effectively, understanding and predicting human behavior …
reasoning. For AI systems to help effectively, understanding and predicting human behavior …
Amuse: Human-AI Collaborative Songwriting with Multimodal Inspirations
Songwriting is often driven by multimodal inspirations, such as imagery, narratives, or
existing music, yet songwriters remain unsupported by current music AI systems in …
existing music, yet songwriters remain unsupported by current music AI systems in …
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
Video Quality Assessment (VQA) is vital for large-scale video retrieval systems, aimed at
identifying quality issues to prioritize high-quality videos. In industrial systems, low-quality …
identifying quality issues to prioritize high-quality videos. In industrial systems, low-quality …