Google Acadèmic

Y Wei, D Hu, Y Tian, X Li - arxiv preprint arxiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Desa Cita Citat per 68 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised multimodal learning: A survey

Y Zong, O Mac Aodha, T Hospedales - arxiv preprint arxiv:2304.01008, 2023 - arxiv.org

Multimodal learning, which aims to understand and analyze information from multiple
modalities, has achieved substantial progress in the supervised regime in recent years …

Desa Cita Citat per 37 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Learning to answer questions in dynamic audio-visual scenarios

G Li, Y Wei, Y Tian, C Xu, JR Wen… - Proceedings of the …, 2022 - openaccess.thecvf.com

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to
answer questions regarding different visual objects, sounds, and their associations in …

Desa Cita Citat per 134 Articles relacionats Totes les 8 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

A light weight model for active speaker detection

J Liao, H Duan, K Feng, W Zhao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Active speaker detection is a challenging task in audio-visual scenarios, with the aim to
detect who is speaking in one or more speaker scenarios. This task has received …

Desa Cita Citat per 38 Articles relacionats Totes les 9 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Annotation-free audio-visual segmentation

J Liu, Y Wang, C Ju, C Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract The objective of Audio-Visual Segmentation (AVS) is to localise the sounding
objects within visual scenes by accurately predicting pixel-wise segmentation masks. To …

Desa Cita Citat per 34 Articles relacionats Totes les 6 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio-visual segmentation via unlabeled frame exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

Desa Cita Citat per 5 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Progressive spatio-temporal perception for audio-visual question answering

G Li, W Hou, D Hu - Proceedings of the 31st ACM international …, 2023 - dl.acm.org

Audio-Visual Question Answering (AVQA) task aims to answer questions about different
visual objects, sounds, and their associations in videos. Such naturally multi-modal videos …

Desa Cita Citat per 27 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com

In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

Desa Cita Citat per 18 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Prompting segmentation with sound is generalizable audio-visual source localizer

Y Wang, W Liu, G Li, J Ding, D Hu, X Li - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Never having seen an object and heard its sound simultaneously, can the model still
accurately localize its visual position from the input audio? In this work, we concentrate on …

Desa Cita Citat per 16 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

Semantic and relation modulation for audio-visual event localization

H Wang, ZJ Zha, L Li, X Chen… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

We study the problem of localizing audio-visual events that are both audible and visible in a
video. Existing works focus on encoding and aligning audio and visual features at the …

Desa Cita Citat per 28 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Class-aware sounding objects localization via audiovisual correspondence

Learning in audio-visual context: A review, analysis, and new perspective

Self-supervised multimodal learning: A survey

Learning to answer questions in dynamic audio-visual scenarios

A light weight model for active speaker detection

Annotation-free audio-visual segmentation

Audio-visual segmentation via unlabeled frame exploitation

Progressive spatio-temporal perception for audio-visual question answering

Egocentric auditory attention localization in conversations

Prompting segmentation with sound is generalizable audio-visual source localizer

Semantic and relation modulation for audio-visual event localization