الباحث العلمي من Google

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection‏

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021‏ - dl.acm.org‏

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …‏

حفظ اقتباس تم اقتباسها في عدد: 194 مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

A comprehensive survey on video saliency detection with auditory information: the audio-visual consistency perceptual is the key!‏

C Chen, M Song, W Song, L Guo… - IEEE Transactions on …, 2022‏ - ieeexplore.ieee.org‏

Video saliency detection (VSD) aims at fast locating the most attractive
objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied …‏

حفظ اقتباس تم اقتباسها في عدد: 27 مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

A light weight model for active speaker detection‏

J Liao, H Duan, K Feng, W Zhao… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Active speaker detection is a challenging task in audio-visual scenarios, with the aim to
detect who is speaking in one or more speaker scenarios. This task has received …‏

حفظ اقتباس تم اقتباسها في عدد: 38 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Loconet: Long-short context network for active speaker detection‏

X Wang, F Cheng, G Bertasius - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com‏

Abstract Active Speaker Detection (ASD) aims to identify who is speaking in each frame of a
video. Solving ASD involves using audio and visual information in two complementary …‏

حفظ اقتباس تم اقتباسها في عدد: 19 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

Sd-nerf: Towards lifelike talking head animation via spatially-adaptive dual-driven nerfs‏

S Shen, W Li, X Huang, Z Zhu… - IEEE Transactions on …, 2023‏ - ieeexplore.ieee.org‏

Recent years have witnessed great progress in audio-driven talking head animation. Among
these methods, the 3D-based ones better preserve the 3D consistency of the generated …‏

حفظ اقتباس تم اقتباسها في عدد: 22 مقالات ذات صلة الإصدارات الـ 2كلها

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Multi-modal perception attention network with self-supervised learning for audio-visual speaker tracking‏

Y Li, H Liu, H Tang - Proceedings of the AAAI Conference on Artificial …, 2022‏ - ojs.aaai.org‏

Multi-modal fusion is proven to be an effective method to improve the accuracy and
robustness of speaker tracking, especially in complex scenarios. However, how to combine …‏

حفظ اقتباس تم اقتباسها في عدد: 24 مقالات ذات صلة الإصدارات الـ 9كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speaker recognition with two-step multi-modal deep cleansing‏

R Tao, KA Lee, Z Shi, H Li - ICASSP 2023-2023 IEEE …, 2023‏ - ieeexplore.ieee.org‏

Neural network-based speaker recognition has achieved significant improvement in recent
years. A robust speaker representation learns meaningful knowledge from both hard and …‏

حفظ اقتباس تم اقتباسها في عدد: 18 مقالات ذات صلة الإصدارات الـ 4كلها

[Free GPT-4]
[DeepSeek]

[PDF] wei-xue.com

Deep audio-visual beamforming for speaker localization‏

X Qian, Q Zhang, G Guan, W Xue - IEEE Signal Processing …, 2022‏ - ieeexplore.ieee.org‏

Generalized Cross Correlation (GCC) is the most popular localization technique over the
past decades and can be extended with the beamforming method eg Steered Response …‏

حفظ اقتباس تم اقتباسها في عدد: 14 مقالات ذات صلة الإصدارات الـ 3كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Multi-stage Face-voice Association Learning with Keynote Speaker Diarization‏

R Tao, Z Shi, Y Jiang, DT Truong, ES Chng… - Proceedings of the …, 2024‏ - dl.acm.org‏

The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …‏

حفظ اقتباس تم اقتباسها في عدد: 2 مقالات ذات صلة الإصدارات الـ 3كلها

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

Audiovisual Tracking of Multiple Speakers in Smart Spaces‏

F Sanabria-Macias, M Marron-Romera… - Sensors, 2023‏ - mdpi.com‏

This paper presents GAVT, a highly accurate audiovisual 3D tracking system based on
particle filters and a probabilistic framework, employing a single camera and a microphone …‏

حفظ اقتباس تم اقتباسها في عدد: 2 مقالات ذات صلة الإصدارات الـ 7كلها نسخة مخزَّنة مؤقتًا

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Audio-visual tracking of concurrent speakers

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection‏

A comprehensive survey on video saliency detection with auditory information: the audio-visual consistency perceptual is the key!‏

A light weight model for active speaker detection‏

Loconet: Long-short context network for active speaker detection‏

Sd-nerf: Towards lifelike talking head animation via spatially-adaptive dual-driven nerfs‏

Multi-modal perception attention network with self-supervised learning for audio-visual speaker tracking‏

Speaker recognition with two-step multi-modal deep cleansing‏

Deep audio-visual beamforming for speaker localization‏

Multi-stage Face-voice Association Learning with Keynote Speaker Diarization‏

Audiovisual Tracking of Multiple Speakers in Smart Spaces‏