A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Encoder-decoder based attractors for end-to-end neural diarization

S Horiguchi, Y Fujita, S Watanabe… - … /ACM Transactions on …, 2022 - ieeexplore.ieee.org
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …

Diaper: End-to-end neural diarization with perceiver-based attractors

F Landini, T Stafylakis, L Burget - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …

Cross-channel attention-based target speaker voice activity detection: Experimental results for the m2met challenge

W Wang, X Qin, M Li - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
DukeECE. As the highly overlapped speech exists in the dataset, we employ an x-vector-
based target-speaker voice activity detection (TS-VAD) to find the overlap between …

Towards neural diarization for unlimited numbers of speakers using global and local attractors

S Horiguchi, S Watanabe, P García… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully
tuned conventional clustering-based methods on challenging datasets. However, the main …

Diarizationlm: Speaker diarization post-processing with large language models

Q Wang, Y Huang, G Zhao, E Clark, W **a… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we introduce DiarizationLM, a framework to leverage large language models
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …

From simulated mixtures to simulated conversations as training data for end-to-end neural diarization

F Landini, A Lozano-Diez, M Diez, L Burget - arxiv preprint arxiv …, 2022 - arxiv.org
End-to-end neural diarization (EEND) is nowadays one of the most prominent research
topics in speaker diarization. EEND presents an attractive alternative to standard cascaded …

The Hitachi-JHU DIHARD III system: Competitive end-to-end neural diarization and x-vector clustering systems combined by DOVER-Lap

S Horiguchi, N Yalta, P Garcia, Y Takashima… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper provides a detailed description of the Hitachi-JHU system that was submitted to
the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results …

Online neural diarization of unlimited numbers of speakers using global and local attractors

S Horiguchi, S Watanabe, P García… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
A method to perform offline and online speaker diarization for an unlimited number of
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …

Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

JM Coria, H Bredin, S Ghannay… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
We propose to address online speaker diarization as a combination of incremental
clustering and local diarization applied to a rolling buffer updated every 500ms. Every single …