A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
Speaker diarization with LSTM
For many years, i-vector based audio embedding techniques were the dominant approach
for speaker verification and speaker diarization applications. However, mirroring the rise of …
for speaker verification and speaker diarization applications. However, mirroring the rise of …
Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario
I Medennikov, M Korenevsky, T Prisyach… - arxiv preprint arxiv …, 2020 - arxiv.org
Speaker diarization for real-life scenarios is an extremely challenging problem. Widely used
clustering-based diarization approaches perform rather poorly in such conditions, mainly …
clustering-based diarization approaches perform rather poorly in such conditions, mainly …
Fully supervised speaker diarization
In this paper, we propose a fully supervised speaker diarization approach, named
unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker …
unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker …
Auto-tuning spectral clustering for speaker diarization using normalized maximum eigengap
In this study, we propose a new spectral clustering framework that can auto-tune the
parameters of the clustering algorithm in the context of speaker diarization. The proposed …
parameters of the clustering algorithm in the context of speaker diarization. The proposed …
[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings
We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …
Unsupervised methods for speaker diarization: An integrated and iterative approach
In speaker diarization, standard approaches typically perform speaker clustering on some
initial segmentation before refining the segment boundaries in a re-segmentation step to …
initial segmentation before refining the segment boundaries in a re-segmentation step to …
Incremental spectral clustering by efficiently updating the eigen-system
In recent years, the spectral clustering method has gained attentions because of its superior
performance. To the best of our knowledge, the existing spectral clustering algorithms …
performance. To the best of our knowledge, the existing spectral clustering algorithms …
Discriminative neural clustering for speaker diarisation
In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data
clustering with a maximum number of clusters as a supervised sequence-to-sequence …
clustering with a maximum number of clusters as a supervised sequence-to-sequence …
Target-speaker voice activity detection with improved i-vector estimation for unknown number of speaker
Target-speaker voice activity detection (TS-VAD) has recently shown promising results for
speaker diarization on highly overlapped speech. However, the original model requires a …
speaker diarization on highly overlapped speech. However, the original model requires a …