Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - ar**_Network_for_Sound_Localization_From_Mixtures_CVPR_2023_paper.pdf" data-clk="hl=nl&sa=T&oi=gga&ct=gga&cd=4&d=1463270960097422726&ei=a9a7Z-WWCtmlieoPh8LdsQY" data-clk-atid="hm3Aei-VThQJ" target="_blank">[PDF] thecvf.com

Audio-visual grou** network for sound localization from mixtures

S Mo, Y Tian - Proceedings of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Sound source localization is a typical and challenging task that predicts the location of
sound sources in a video. Previous single-source methods mainly used the audio-visual …

Audio-visual class-incremental learning

W Pian, S Mo, Y Guo, Y Tian - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In this paper, we introduce audio-visual class-incremental learning, a class-incremental
learning scenario for audio-visual video recognition. We demonstrate that joint audio-visual …

Multimodal variational auto-encoder based audio-visual segmentation

Y Mao, J Zhang, M **ang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose an Explicit Conditional Multimodal Variational Auto-Encoder
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …

Catr: Combinatorial-dependence audio-queried transformer for audio-visual video segmentation

K Li, Z Yang, L Chen, Y Yang, J **ao - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Audio-visual video segmentation (AVVS) aims to generate pixel-level maps of sound-
producing objects within image frames and ensure the maps faithfully adheres to the given …

Unified multisensory perception: Weakly-supervised audio-visual video parsing

Y Tian, D Li, C Xu - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer
In this paper, we introduce a new problem, named audio-visual video parsing, which aims to
parse a video into temporal event segments and label them as either audible, visible, or …

Avsegformer: Audio-visual segmentation with transformer

S Gao, Z Chen, G Chen, W Wang, T Lu - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Audio-visual segmentation (AVS) aims to locate and segment the sounding objects in a
given video, which demands audio-driven pixel-level scene understanding. The existing …