Neural target speech extraction: An overview

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario

I Medennikov, M Korenevsky, T Prisyach… - ar** speech in a diarization system.
First, we detail a neural Long Short-Term Memory-based architecture for overlap detection …

Voice activity detection in the wild: A data-driven approach using teacher-student training

H Dinkel, S Wang, X Xu, M Wu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Voice activity detection is an essential pre-processing component for speech-related tasks
such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain …

Personalized percepnet: Real-time, low-complexity target voice separation and enhancement

R Giri, S Venkataramani, JM Valin, U Isik… - arxiv preprint arxiv …, 2021 - arxiv.org
The presence of multiple talkers in the surrounding environment poses a difficult challenge
for real-time speech communication systems considering the constraints on network size …

Marblenet: Deep 1d time-channel separable convolutional neural network for voice activity detection

F Jia, S Majumdar, B Ginsburg - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We present MarbleNet, an end-to-end neural network for Voice Activity Detection (VAD).
MarbleNet is a deep residual network composed from blocks of 1D time-channel separable …

End-to-end active speaker detection

JL Alcázar, M Cordes, C Zhao, B Ghanem - European Conference on …, 2022 - Springer
Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage
process: feature extraction and spatio-temporal context aggregation. In this paper, we …