A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Supervised speech separation based on deep learning: An overview

DL Wang, J Chen - IEEE/ACM transactions on audio, speech …, 2018 - ieeexplore.ieee.org
Speech separation is the task of separating target speech from background interference.
Traditionally, speech separation is studied as a signal processing problem. A more recent …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - ar** for single-and multi-channel speech enhancement and robust ASR
ZQ Wang, P Wang, DL Wang - IEEE/ACM transactions on …, 2020 - ieeexplore.ieee.org
This study proposes a complex spectral map** approach for single-and multi-channel
speech enhancement, where deep neural networks (DNNs) are used to predict the real and …

Multi-channel deep clustering: Discriminative spectral and spatial embeddings for speaker-independent speech separation

ZQ Wang, J Le Roux, JR Hershey - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
The recently-proposed deep clustering algorithm represents a fundamental advance
towards solving the cocktail party problem in the single-channel case. When multiple …

Speakerbeam: Speaker aware neural network for target speaker extraction in speech mixtures

K Žmolíková, M Delcroix, K Kinoshita… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
The processing of speech corrupted by interfering overlap** speakers is one of the
challenging problems with regards to today's automatic speech recognition systems …

End-to-end microphone permutation and number invariant multi-channel speech separation

Y Luo, Z Chen, N Mesgarani… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
An important problem in ad-hoc microphone speech separation is how to guarantee the
robustness of a system with respect to the locations and numbers of microphones. The …