A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
Encoder-decoder based attractors for end-to-end neural diarization
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …
number of speakers. In contrast to the conventional cascaded approach to speaker …
Diaper: End-to-end neural diarization with perceiver-based attractors
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …
Cross-channel attention-based target speaker voice activity detection: Experimental results for the m2met challenge
DukeECE. As the highly overlapped speech exists in the dataset, we employ an x-vector-
based target-speaker voice activity detection (TS-VAD) to find the overlap between …
based target-speaker voice activity detection (TS-VAD) to find the overlap between …
Towards neural diarization for unlimited numbers of speakers using global and local attractors
Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully
tuned conventional clustering-based methods on challenging datasets. However, the main …
tuned conventional clustering-based methods on challenging datasets. However, the main …
Diarizationlm: Speaker diarization post-processing with large language models
In this paper, we introduce DiarizationLM, a framework to leverage large language models
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …
From simulated mixtures to simulated conversations as training data for end-to-end neural diarization
End-to-end neural diarization (EEND) is nowadays one of the most prominent research
topics in speaker diarization. EEND presents an attractive alternative to standard cascaded …
topics in speaker diarization. EEND presents an attractive alternative to standard cascaded …
The Hitachi-JHU DIHARD III system: Competitive end-to-end neural diarization and x-vector clustering systems combined by DOVER-Lap
This paper provides a detailed description of the Hitachi-JHU system that was submitted to
the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results …
the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results …
Online neural diarization of unlimited numbers of speakers using global and local attractors
A method to perform offline and online speaker diarization for an unlimited number of
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …
Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation
We propose to address online speaker diarization as a combination of incremental
clustering and local diarization applied to a rolling buffer updated every 500ms. Every single …
clustering and local diarization applied to a rolling buffer updated every 500ms. Every single …