Neural target speech extraction: An overview

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

Real-time target sound extraction

B Veluri, J Chan, M Itani, T Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We present the first neural network model to achieve real-time and streaming target sound
extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture …

Target speech diarization with multimodal prompts

Y Jiang, R Tao, Z Chen, Y Qian, H Li - ar** to listen at the cocktail party: Text-guided target speaker extraction
X Hao, J Wu, J Yu, C Xu, KC Tan - arxiv preprint arxiv:2310.07284, 2023 - arxiv.org
Humans can easily isolate a single speaker from a complex acoustic environment, a
capability referred to as the" Cocktail Party Effect." However, replicating this ability has been …

[PDF][PDF] PARIS: Pseudo-AutoRegressIve siamese training for online speech separation

Z Pan, G Wichern, FG Germain, K Saijo, J Le Roux - Proc. Interspeech, 2024 - merl.com
While offline speech separation models have made significant advances, the streaming
regime remains less explored and is typically limited to causal modifications of existing …

Hyperbolic distance-based speech separation

D Petermann, M Kim - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org
In this work, we explore the task of hierarchical distance-based speech separation defined
on a hyperbolic manifold. Based on the recent advent of audio-related tasks performed in …