A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

A survey of speaker recognition: Fundamental theories, recognition methods and opportunities

MM Kabir, MF Mridha, J Shin, I Jahan, AQ Ohi - IEEE Access, 2021 - ieeexplore.ieee.org
Humans can identify a speaker by listening to their voice, over the telephone, or on any
digital devices. Acquiring this congenital human competency, authentication technologies …

Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks

F Landini, J Profant, M Diez, L Burget - Computer Speech & Language, 2022 - Elsevier
The recently proposed VBx diarization method uses a Bayesian hidden Markov model to
find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …

Speaker diarization with LSTM

Q Wang, C Downey, L Wan… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
For many years, i-vector based audio embedding techniques were the dominant approach
for speaker verification and speaker diarization applications. However, mirroring the rise of …

End-to-end neural speaker diarization with self-attention

Y Fujita, N Kanda, S Horiguchi, Y Xue… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
Speaker diarization has been mainly developed based on the clustering of speaker
embeddings. However, the clustering-based approach has two major problems; ie,(i) it is not …

End-to-end neural speaker diarization with permutation-free objectives

Y Fujita, N Kanda, S Horiguchi, K Nagamatsu… - arxiv preprint arxiv …, 2019 - arxiv.org
In this paper, we propose a novel end-to-end neural-network-based speaker diarization
method. Unlike most existing methods, our proposed method does not have separate …

Fully supervised speaker diarization

A Zhang, Q Wang, Z Zhu, J Paisley… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
In this paper, we propose a fully supervised speaker diarization approach, named
unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker …

End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors

S Horiguchi, Y Fujita, S Watanabe, Y Xue… - arxiv preprint arxiv …, 2020 - arxiv.org
End-to-end speaker diarization for an unknown number of speakers is addressed in this
paper. Recently proposed end-to-end speaker diarization outperformed conventional …

Speaker diarization using deep neural network embeddings

D Garcia-Romero, D Snyder, G Sell… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org
Speaker diarization is an important front-end for many speech technologies in the presence
of multiple speakers, but current methods that employ i-vector clustering for short segments …

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

R Pappagari, T Wang, J Villalba… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
In this work, we explore the dependencies between speaker recognition and emotion
recognition. We first show that knowledge learned for speaker recognition can be reused for …