- Academic Search

Ava active speaker: An audio-visual dataset for active speaker detection

J Roth, S Chaudhuri, O Klejch, R Marvin… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Active speaker detection is an important component in video analysis algorithms for
applications such as speaker diarization, video re-targeting for meetings, speech …

Save Cite Cited by 193 Related articles All 6 versions Free GPT-4

Active speakers in context

JL Alcázar, F Caba, L Mai, F Perazzi… - Proceedings of the …, 2020 - openaccess.thecvf.com

Current methods for active speaker detection focus on modeling audiovisual information
from a single speaker. This strategy can be adequate for addressing single-speaker …

Save Cite Cited by 98 Related articles All 8 versions Free GPT-4 View as HTML

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com

In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

Save Cite Cited by 18 Related articles All 7 versions Free GPT-4 View as HTML

How to design a three-stage architecture for audio-visual active speaker detection in the wild

O Köpüklü, M Taseska, G Rigoll - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Successful active speaker detection requires a three-stage pipeline:(i) audio-visual
encoding for all speakers in the clip,(ii) inter-speaker relation modeling between a reference …

Save Cite Cited by 57 Related articles All 6 versions Free GPT-4 View as HTML

Maas: Multi-modal assignation for active speaker detection

JL Alcázar, F Caba, AK Thabet… - Proceedings of the …, 2021 - openaccess.thecvf.com

Active speaker detection requires a solid integration of multi-modal cues. While individual
modalities can approximate a solution, accurate predictions can only be achieved by …

Save Cite Cited by 60 Related articles All 8 versions Free GPT-4 View as HTML

End-to-end active speaker detection

JL Alcázar, M Cordes, C Zhao, B Ghanem - European Conference on …, 2022 - Springer

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage
process: feature extraction and spatio-temporal context aggregation. In this paper, we …

Save Cite Cited by 33 Related articles All 7 versions Free GPT-4

Cross-modal supervision for learning active speaker detection in video

P Chakravarty, T Tuytelaars - … Amsterdam, The Netherlands, October 11-14 …, 2016 - Springer

In this paper, we show how to use audio to supervise the learning of active speaker
detection in video. Voice Activity Detection (VAD) guides the learning of the vision-based …

Save Cite Cited by 70 Related articles All 8 versions Free GPT-4

Listen to look into the future: Audio-visual egocentric gaze anticipation

B Lai, F Ryan, W Jia, M Liu, JM Rehg - European Conference on Computer …, 2024 - Springer

Egocentric gaze anticipation serves as a key building block for the emerging capability of
Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4