الباحث العلمي من Google

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022‏ - cell.com‏

Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …‏

حفظ اقتباس تم اقتباسها في عدد: 128 مقالات ذات صلة الإصدارات الـ 12كلها

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Attention bottlenecks for multimodal fusion‏

A Nagrani, S Yang, A Arnab, A Jansen… - Advances in neural …, 2021‏ - proceedings.neurips.cc‏

Humans perceive the world by concurrently processing and fusing high-dimensional inputs
from multiple modalities such as vision and audio. Machine perception models, in stark …‏

حفظ اقتباس تم اقتباسها في عدد: 644 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wav2clip: Learning robust audio representations from clip‏

HH Wu, P Seetharaman, K Kumar… - ICASSP 2022-2022 …, 2022‏ - ieeexplore.ieee.org‏

We propose Wav2CLIP, a robust audio representation learning method by distilling from
Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on …‏

حفظ اقتباس تم اقتباسها في عدد: 283 مقالات ذات صلة الإصدارات الـ 9كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visualvoice: Audio-visual speech separation with cross-modal consistency‏

R Gao, K Grauman - 2021 IEEE/CVF Conference on Computer …, 2021‏ - ieeexplore.ieee.org‏

We introduce a new approach for audio-visual speech separation. Given a video, the goal is
to extract the speech associated with a face in spite of simultaneous back-ground sounds …‏

حفظ اقتباس تم اقتباسها في عدد: 198 مقالات ذات صلة الإصدارات الـ 9كلها

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

A closer look at weakly-supervised audio-visual source localization‏

S Mo, P Morgado - Advances in Neural Information …, 2022‏ - proceedings.neurips.cc‏

Audio-visual source localization is a challenging task that aims to predict the location of
visual sound sources in a video. Since collecting ground-truth annotations of sounding …‏

حفظ اقتباس تم اقتباسها في عدد: 60 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross-modality fusion transformer for multispectral object detection‏

F Qingyun, H Dapeng, W Zhaokui - ar**_Network_for_Sound_Localization_From_Mixtures_CVPR_2023_paper.pdf" data-clk="hl=ar&sa=T&oi=gga&ct=gga&cd=7&d=1463270960097422726&ei=uvytZ_3PHIKy6rQPh6ju2A8" data-clk-atid="hm3Aei-VThQJ" target="_blank">[PDF] thecvf.com

Audio-visual grou** network for sound localization from mixtures‏

S Mo, Y Tian - Proceedings of the IEEE/CVF Conference on …, 2023‏ - openaccess.thecvf.com‏

Sound source localization is a typical and challenging task that predicts the location of
sound sources in a video. Previous single-source methods mainly used the audio-visual …‏

حفظ اقتباس تم اقتباسها في عدد: 46 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Exploring cross-video and cross-modality signals for weakly-supervised audio-visual video parsing‏

YB Lin, HY Tseng, HY Lee, YY Lin… - Advances in Neural …, 2021‏ - proceedings.neurips.cc‏

The audio-visual video parsing task aims to temporally parse a video into audio or visual
event categories. However, it is labor intensive to temporally annotate audio and visual …‏

حفظ اقتباس تم اقتباسها في عدد: 79 مقالات ذات صلة الإصدارات الـ 11كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sound source localization is all about cross-modal alignment‏

A Senocak, H Ryu, J Kim, TH Oh… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Humans can easily perceive the direction of sound sources in a visual scene, termed sound
source localization. Recent studies on learning-based sound source localization have …‏

حفظ اقتباس تم اقتباسها في عدد: 15 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Audio self-supervised learning: A survey‏

Attention bottlenecks for multimodal fusion‏

Wav2clip: Learning robust audio representations from clip‏

Visualvoice: Audio-visual speech separation with cross-modal consistency‏

A closer look at weakly-supervised audio-visual source localization‏

Cross-modality fusion transformer for multispectral object detection‏

Audio-visual grou** network for sound localization from mixtures‏

Exploring cross-video and cross-modality signals for weakly-supervised audio-visual video parsing‏

Sound source localization is all about cross-modal alignment‏