Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Encoder-decoder based attractors for end-to-end neural diarization

S Horiguchi, Y Fujita, S Watanabe… - … /ACM Transactions on …, 2022 - ieeexplore.ieee.org
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …

Diaper: End-to-end neural diarization with perceiver-based attractors

F Landini, T Stafylakis, L Burget - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …

From simulated mixtures to simulated conversations as training data for end-to-end neural diarization

F Landini, A Lozano-Diez, M Diez, L Burget - arxiv preprint arxiv …, 2022 - arxiv.org
End-to-end neural diarization (EEND) is nowadays one of the most prominent research
topics in speaker diarization. EEND presents an attractive alternative to standard cascaded …

Target speaker voice activity detection with transformers and its integration with end-to-end neural diarization

D Wang, X **ao, N Kanda… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
This paper describes a speaker diarization model based on target speaker voice activity
detection (TS-VAD) using transformers. To overcome the original TS-VAD model's drawback …

Online neural diarization of unlimited numbers of speakers using global and local attractors

S Horiguchi, S Watanabe, P García… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
A method to perform offline and online speaker diarization for an unlimited number of
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …

Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer

Z Chen, B Han, S Wang, Y Qian - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Deep neural network-based systems have significantly improved the performance of
speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often …

Frame-wise and overlap-robust speaker embeddings for meeting diarization

T Cord-Landwehr, C Boeddeker… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Using a Teacher-Student training approach we developed a speaker embedding extraction
system that outputs embeddings at frame rate. Given this high temporal resolution and the …

Tight integration of neural-and clustering-based diarization through deep unfolding of infinite gaussian mixture model

K Kinoshita, M Delcroix, T Iwata - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Speaker diarization has been investigated extensively as an important central task for
meeting analysis. Recent trend shows that integration of end-to-end neural (EEND)-and …

Speaker overlap-aware neural diarization for multi-party meeting analysis

Z Du, S Zhang, S Zheng, Z Yan - arxiv preprint arxiv:2211.10243, 2022 - arxiv.org
Recently, hybrid systems of clustering and neural diarization models have been successfully
applied in multi-party meeting analysis. However, current models always treat overlapped …