A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
A survey of speaker recognition: Fundamental theories, recognition methods and opportunities
Humans can identify a speaker by listening to their voice, over the telephone, or on any
digital devices. Acquiring this congenital human competency, authentication technologies …
digital devices. Acquiring this congenital human competency, authentication technologies …
Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks
The recently proposed VBx diarization method uses a Bayesian hidden Markov model to
find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …
find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …
Speaker diarization with LSTM
For many years, i-vector based audio embedding techniques were the dominant approach
for speaker verification and speaker diarization applications. However, mirroring the rise of …
for speaker verification and speaker diarization applications. However, mirroring the rise of …
End-to-end neural speaker diarization with self-attention
Speaker diarization has been mainly developed based on the clustering of speaker
embeddings. However, the clustering-based approach has two major problems; ie,(i) it is not …
embeddings. However, the clustering-based approach has two major problems; ie,(i) it is not …
End-to-end neural speaker diarization with permutation-free objectives
In this paper, we propose a novel end-to-end neural-network-based speaker diarization
method. Unlike most existing methods, our proposed method does not have separate …
method. Unlike most existing methods, our proposed method does not have separate …
Fully supervised speaker diarization
In this paper, we propose a fully supervised speaker diarization approach, named
unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker …
unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker …
End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors
End-to-end speaker diarization for an unknown number of speakers is addressed in this
paper. Recently proposed end-to-end speaker diarization outperformed conventional …
paper. Recently proposed end-to-end speaker diarization outperformed conventional …
Speaker diarization using deep neural network embeddings
Speaker diarization is an important front-end for many speech technologies in the presence
of multiple speakers, but current methods that employ i-vector clustering for short segments …
of multiple speakers, but current methods that employ i-vector clustering for short segments …
x-vectors meet emotions: A study on dependencies between emotion and speaker recognition
In this work, we explore the dependencies between speaker recognition and emotion
recognition. We first show that knowledge learned for speaker recognition can be reused for …
recognition. We first show that knowledge learned for speaker recognition can be reused for …