A light weight model for active speaker detection
Active speaker detection is a challenging task in audio-visual scenarios, with the aim to
detect who is speaking in one or more speaker scenarios. This task has received …
detect who is speaking in one or more speaker scenarios. This task has received …
Egocentric auditory attention localization in conversations
In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …
auditory attention, or the ability to focus on a particular speaker while tuning out others …
Target active speaker detection with audio-visual cues
In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …
Loconet: Long-short context network for active speaker detection
Abstract Active Speaker Detection (ASD) aims to identify who is speaking in each frame of a
video. Solving ASD involves using audio and visual information in two complementary …
video. Solving ASD involves using audio and visual information in two complementary …
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Large pretrained models are increasingly crucial in modern computer vision tasks. These
models are typically used in downstream tasks by end-to-end finetuning which is highly …
models are typically used in downstream tasks by end-to-end finetuning which is highly …
Egoloc: Revisiting 3d object localization from egocentric videos with visual queries
With the recent advances in video and 3D understanding, novel 4D spatio-temporal
methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic …
methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic …
Listen to look into the future: Audio-visual egocentric gaze anticipation
Egocentric gaze anticipation serves as a key building block for the emerging capability of
Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals …
Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals …
Joint audio-visual idling vehicle detection with streamlined input dependencies
Idling vehicle detection (IVD) can be helpful in monitoring and reducing unnecessary idling
and can be integrated into real-time systems to address the resulting pollution and harmful …
and can be integrated into real-time systems to address the resulting pollution and harmful …
BIAS: A Body-based Interpretable Active Speaker Approach
State-of-the-art Active Speaker Detection (ASD) approaches heavily rely on audio and facial
features to perform, which is not a sustainable approach in wild scenarios. Although these …
features to perform, which is not a sustainable approach in wild scenarios. Although these …
Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges
Nowadays, the large amount of audio-visual content available has fostered the need to
develop new robust automatic speaker diarization systems to analyse and characterise it …
develop new robust automatic speaker diarization systems to analyse and characterise it …