- Academic Search

Y Wei, D Hu, Y Tian, X Li - ar**_Network_for_Sound_Localization_From_Mixtures_CVPR_2023_paper.pdf" data-clk="hl=nl&sa=T&oi=gga&ct=gga&cd=4&d=1463270960097422726&ei=a9a7Z-WWCtmlieoPh8LdsQY" data-clk-atid="hm3Aei-VThQJ" target="_blank">[PDF] thecvf.com

Audio-visual grou** network for sound localization from mixtures

S Mo, Y Tian - Proceedings of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

Sound source localization is a typical and challenging task that predicts the location of
sound sources in a video. Previous single-source methods mainly used the audio-visual …

Opslaan Citeren Geciteerd door 47 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio-visual class-incremental learning

W Pian, S Mo, Y Guo, Y Tian - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

In this paper, we introduce audio-visual class-incremental learning, a class-incremental
learning scenario for audio-visual video recognition. We demonstrate that joint audio-visual …

Opslaan Citeren Geciteerd door 33 Verwante artikelen Alle 6 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multimodal variational auto-encoder based audio-visual segmentation

Y Mao, J Zhang, M **ang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We propose an Explicit Conditional Multimodal Variational Auto-Encoder
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …

Opslaan Citeren Geciteerd door 34 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Catr: Combinatorial-dependence audio-queried transformer for audio-visual video segmentation

K Li, Z Yang, L Chen, Y Yang, J **ao - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Audio-visual video segmentation (AVVS) aims to generate pixel-level maps of sound-
producing objects within image frames and ensure the maps faithfully adheres to the given …

Opslaan Citeren Geciteerd door 47 Verwante artikelen Alle 4 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unified multisensory perception: Weakly-supervised audio-visual video parsing

Y Tian, D Li, C Xu - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer

In this paper, we introduce a new problem, named audio-visual video parsing, which aims to
parse a video into temporal event segments and label them as either audible, visible, or …

Opslaan Citeren Geciteerd door 191 Verwante artikelen Alle 10 versies

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Avsegformer: Audio-visual segmentation with transformer

S Gao, Z Chen, G Chen, W Wang, T Lu - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Audio-visual segmentation (AVS) aims to locate and segment the sounding objects in a
given video, which demands audio-driven pixel-level scene understanding. The existing …

Opslaan Citeren Geciteerd door 35 Verwante artikelen Alle 5 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Dual-modality seq2seq network for audio-visual event localization

Learning in audio-visual context: A review, analysis, and new perspective

Audio-visual grou** network for sound localization from mixtures

Audio-visual class-incremental learning

Multimodal variational auto-encoder based audio-visual segmentation

Catr: Combinatorial-dependence audio-queried transformer for audio-visual video segmentation

Unified multisensory perception: Weakly-supervised audio-visual video parsing

Avsegformer: Audio-visual segmentation with transformer