Neural target speech extraction: An overview
K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …
Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …
more speakers. The successful ASD depends on accurate interpretation of short-term and …
Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario
I Medennikov, M Korenevsky, T Prisyach… - ar** speech in a diarization system.
First, we detail a neural Long Short-Term Memory-based architecture for overlap detection …
First, we detail a neural Long Short-Term Memory-based architecture for overlap detection …
Voice activity detection in the wild: A data-driven approach using teacher-student training
Voice activity detection is an essential pre-processing component for speech-related tasks
such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain …
such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain …
Personalized percepnet: Real-time, low-complexity target voice separation and enhancement
The presence of multiple talkers in the surrounding environment poses a difficult challenge
for real-time speech communication systems considering the constraints on network size …
for real-time speech communication systems considering the constraints on network size …
Marblenet: Deep 1d time-channel separable convolutional neural network for voice activity detection
We present MarbleNet, an end-to-end neural network for Voice Activity Detection (VAD).
MarbleNet is a deep residual network composed from blocks of 1D time-channel separable …
MarbleNet is a deep residual network composed from blocks of 1D time-channel separable …
End-to-end active speaker detection
Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage
process: feature extraction and spatio-temporal context aggregation. In this paper, we …
process: feature extraction and spatio-temporal context aggregation. In this paper, we …