Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization

J Thienpondt, K Demuynck - arxiv preprint arxiv:2405.09142, 2024 - arxiv.org
Current speaker diarization systems rely on an external voice activity detection model prior
to speaker embedding extraction on the detected speech segments. In this paper, we …

Disentangled Representation Learning for Environment-agnostic Speaker Recognition

KH Nam, HS Heo, J Jung, JS Chung - arxiv preprint arxiv:2406.14559, 2024 - arxiv.org
This work presents a framework based on feature disentanglement to learn speaker
embeddings that are robust to environmental variations. Our framework utilises an auto …

Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization

P Pálka, F Landini, D Klement, M Diez… - arxiv preprint arxiv …, 2024 - arxiv.org
In spite of the popularity of end-to-end diarization systems nowadays, modular systems
comprised of voice activity detection (VAD), speaker embedding extraction plus clustering …