Wavlm: Large-scale self-supervised pre-training for full stack speech processing
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …
exploration has been attempted for other speech processing tasks. As speech signal …
Encoder-decoder based attractors for end-to-end neural diarization
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …
number of speakers. In contrast to the conventional cascaded approach to speaker …
Diaper: End-to-end neural diarization with perceiver-based attractors
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …
From simulated mixtures to simulated conversations as training data for end-to-end neural diarization
End-to-end neural diarization (EEND) is nowadays one of the most prominent research
topics in speaker diarization. EEND presents an attractive alternative to standard cascaded …
topics in speaker diarization. EEND presents an attractive alternative to standard cascaded …
Target speaker voice activity detection with transformers and its integration with end-to-end neural diarization
This paper describes a speaker diarization model based on target speaker voice activity
detection (TS-VAD) using transformers. To overcome the original TS-VAD model's drawback …
detection (TS-VAD) using transformers. To overcome the original TS-VAD model's drawback …
Online neural diarization of unlimited numbers of speakers using global and local attractors
A method to perform offline and online speaker diarization for an unlimited number of
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer
Deep neural network-based systems have significantly improved the performance of
speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often …
speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often …
Frame-wise and overlap-robust speaker embeddings for meeting diarization
Using a Teacher-Student training approach we developed a speaker embedding extraction
system that outputs embeddings at frame rate. Given this high temporal resolution and the …
system that outputs embeddings at frame rate. Given this high temporal resolution and the …
Tight integration of neural-and clustering-based diarization through deep unfolding of infinite gaussian mixture model
Speaker diarization has been investigated extensively as an important central task for
meeting analysis. Recent trend shows that integration of end-to-end neural (EEND)-and …
meeting analysis. Recent trend shows that integration of end-to-end neural (EEND)-and …
Speaker overlap-aware neural diarization for multi-party meeting analysis
Recently, hybrid systems of clustering and neural diarization models have been successfully
applied in multi-party meeting analysis. However, current models always treat overlapped …
applied in multi-party meeting analysis. However, current models always treat overlapped …