A consolidated perspective on multimicrophone speech enhancement and source separation

S Gannot, E Vincent… - … /ACM Transactions on …, 2017 - ieeexplore.ieee.org
Speech enhancement and separation are core problems in audio signal processing, with
commercial applications in devices as diverse as mobile phones, conference call systems …

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com
In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

Egocentric deep multi-channel audio-visual active speaker localization

H Jiang, C Murdock, VK Ithapu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Augmented reality devices have the potential to enhance human perception and enable
other assistive functionalities in complex conversational environments. Effectively capturing …

Self-motion as supervision for egocentric audiovisual localization

C Murdock, I Ananthabhotla, H Lu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Sound source localization is a key requirement for many assistive applications of
augmented reality, such as speech enhancement. In conversational settings, potential …

Data-driven multi-microphone speaker localization on manifolds

B Laufer-Goldshtein, R Talmon… - … and Trends® in Signal …, 2020 - nowpublishers.com
Speech enhancement is a core problem in audio signal processing with commercial
applications in devices as diverse as mobile phones, conference call systems, smart …

Rethinking audio-visual synchronization for active speaker detection

A Wuerkaixi, Y Zhang, Z Duan… - 2022 IEEE 32nd …, 2022 - ieeexplore.ieee.org
Active speaker detection (ASD) systems are important modules for analyzing multi-talker
conversations. They aim to detect which speakers or none are talking in a visual scene at …

Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges

V Mingote, A Ortega, A Miguel, E Lleida - arxiv preprint arxiv:2409.05659, 2024 - arxiv.org
Nowadays, the large amount of audio-visual content available has fostered the need to
develop new robust automatic speaker diarization systems to analyse and characterise it …

A hybrid approach for speaker tracking based on TDOA and data-driven models

B Laufer-Goldshtein, R Talmon… - IEEE/ACM Transactions …, 2018 - ieeexplore.ieee.org
The problem of speaker tracking in noisy and reverberant enclosures is addressed in this
paper. We present a hybrid algorithm, combining traditional tracking schemes with a new …

Multimodal egocentric analysis of focused interactions

S Bano, T Suveges, J Zhang, SJ Mckenna - IEEE Access, 2018 - ieeexplore.ieee.org
Continuous detection of social interactions from wearable sensor data streams has a range
of potential applications in domains, including health and social care, security, and assistive …

Gaussian mixture estimation from weighted samples

D Frisch, UD Hanebeck - 2021 ieee international conference …, 2021 - ieeexplore.ieee.org
We consider estimating the parameters of a Gaussian mixture density with a given number
of components best representing a given set of weighted samples. We adopt a density …