Audio-visual instance segmentation
In this paper, we propose a new multi-modal task, termed audio-visual instance
segmentation (AVIS), which aims to simultaneously identify, segment and track individual …
segmentation (AVIS), which aims to simultaneously identify, segment and track individual …
Toward Long Form Audio-Visual Video Understanding
We live in a world filled with never-ending streams of multimodal information. As a more
natural recording of the real scenario, long form audio-visual videos (LFAVs) are expected …
natural recording of the real scenario, long form audio-visual videos (LFAVs) are expected …
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
The Audio Visual Question Answering (AVQA) task aims to answer questions related to
various visual objects, sounds, and their interactions in videos. Such naturally multimodal …
various visual objects, sounds, and their interactions in videos. Such naturally multimodal …
LINK: Adaptive Modality Interaction for Audio-Visual Video Parsing
L Wang, B Zhu, Y Chen, J Wang - arxiv preprint arxiv:2412.20872, 2024 - arxiv.org
Audio-visual video parsing focuses on classifying videos through weak labels while
identifying events as either visible, audible, or both, alongside their respective temporal …
identifying events as either visible, audible, or both, alongside their respective temporal …