Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …
more speakers. The successful ASD depends on accurate interpretation of short-term and …
Ava active speaker: An audio-visual dataset for active speaker detection
Active speaker detection is an important component in video analysis algorithms for
applications such as speaker diarization, video re-targeting for meetings, speech …
applications such as speaker diarization, video re-targeting for meetings, speech …
Active speakers in context
Current methods for active speaker detection focus on modeling audiovisual information
from a single speaker. This strategy can be adequate for addressing single-speaker …
from a single speaker. This strategy can be adequate for addressing single-speaker …
Egocentric auditory attention localization in conversations
In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …
auditory attention, or the ability to focus on a particular speaker while tuning out others …
How to design a three-stage architecture for audio-visual active speaker detection in the wild
Successful active speaker detection requires a three-stage pipeline:(i) audio-visual
encoding for all speakers in the clip,(ii) inter-speaker relation modeling between a reference …
encoding for all speakers in the clip,(ii) inter-speaker relation modeling between a reference …
Maas: Multi-modal assignation for active speaker detection
Active speaker detection requires a solid integration of multi-modal cues. While individual
modalities can approximate a solution, accurate predictions can only be achieved by …
modalities can approximate a solution, accurate predictions can only be achieved by …
End-to-end active speaker detection
Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage
process: feature extraction and spatio-temporal context aggregation. In this paper, we …
process: feature extraction and spatio-temporal context aggregation. In this paper, we …
Cross-modal supervision for learning active speaker detection in video
In this paper, we show how to use audio to supervise the learning of active speaker
detection in video. Voice Activity Detection (VAD) guides the learning of the vision-based …
detection in video. Voice Activity Detection (VAD) guides the learning of the vision-based …
Listen to look into the future: Audio-visual egocentric gaze anticipation
Egocentric gaze anticipation serves as a key building block for the emerging capability of
Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals …
Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals …
Microphone array for speaker localization and identification in shared autonomous vehicles
With the current technological transformation in the automotive industry, autonomous
vehicles are getting closer to the Society of Automative Engineers (SAE) automation level 5 …
vehicles are getting closer to the Society of Automative Engineers (SAE) automation level 5 …