A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
Robust speech recognition via large-scale weak supervision
We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …
Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …
models pre-trained using large, diverse unlabeled datasets containing approximately a …
Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks
The recently proposed VBx diarization method uses a Bayesian hidden Markov model to
find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …
find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …
Robust self-supervised audio-visual speech recognition
Audio-based automatic speech recognition (ASR) degrades significantly in noisy
environments and is particularly vulnerable to interfering speech, as the model cannot …
environments and is particularly vulnerable to interfering speech, as the model cannot …
Speechstew: Simply mix all available speech recognition data to train one large neural network
We present SpeechStew, a speech recognition model that is trained on a combination of
various publicly available speech recognition datasets: AMI, Broadcast News, Common …
various publicly available speech recognition datasets: AMI, Broadcast News, Common …
The third DIHARD diarization challenge
DIHARD III was the third in a series of speaker diarization challenges intended to improve
the robustness of diarization systems to variability in recording equipment, noise conditions …
the robustness of diarization systems to variability in recording equipment, noise conditions …
The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios
The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …
Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario
I Medennikov, M Korenevsky, T Prisyach… - arxiv preprint arxiv …, 2020 - arxiv.org
Speaker diarization for real-life scenarios is an extremely challenging problem. Widely used
clustering-based diarization approaches perform rather poorly in such conditions, mainly …
clustering-based diarization approaches perform rather poorly in such conditions, mainly …
Continuous speech separation with conformer
Continuous speech separation was recently proposed to deal with the overlapped speech in
natural conversations. While it was shown to significantly improve the speech recognition …
natural conversations. While it was shown to significantly improve the speech recognition …