Google Académico

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Guardar Citar Citado por 420 Artículos relacionados Las 7 versiones

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press

We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

Guardar Citar Citado por 3882 Artículos relacionados Las 11 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

Y Zhang, DS Park, W Han, J Qin… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …

Guardar Citar Citado por 204 Artículos relacionados Las 4 versiones

[Free GPT-4]
[DeepSeek]

[PDF] vut.cz

Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks

F Landini, J Profant, M Diez, L Burget - Computer Speech & Language, 2022 - Elsevier

The recently proposed VBx diarization method uses a Bayesian hidden Markov model to
find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …

Guardar Citar Citado por 233 Artículos relacionados Las 6 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Robust self-supervised audio-visual speech recognition

B Shi, WN Hsu, A Mohamed - arxiv preprint arxiv:2201.01763, 2022 - arxiv.org

Audio-based automatic speech recognition (ASR) degrades significantly in noisy
environments and is particularly vulnerable to interfering speech, as the model cannot …

Guardar Citar Citado por 126 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speechstew: Simply mix all available speech recognition data to train one large neural network

W Chan, D Park, C Lee, Y Zhang, Q Le… - arxiv preprint arxiv …, 2021 - arxiv.org

We present SpeechStew, a speech recognition model that is trained on a combination of
various publicly available speech recognition datasets: AMI, Broadcast News, Common …

Guardar Citar Citado por 165 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The third DIHARD diarization challenge

N Ryant, P Singh, V Krishnamohan, R Varma… - arxiv preprint arxiv …, 2020 - arxiv.org

DIHARD III was the third in a series of speaker diarization challenges intended to improve
the robustness of diarization systems to variability in recording equipment, noise conditions …

Guardar Citar Citado por 172 Artículos relacionados Las 11 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

S Cornell, M Wiesner, S Watanabe, D Raj… - arxiv preprint arxiv …, 2023 - arxiv.org

The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …

Guardar Citar Citado por 58 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario

I Medennikov, M Korenevsky, T Prisyach… - arxiv preprint arxiv …, 2020 - arxiv.org

Speaker diarization for real-life scenarios is an extremely challenging problem. Widely used
clustering-based diarization approaches perform rather poorly in such conditions, mainly …

Guardar Citar Citado por 233 Artículos relacionados Las 10 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Continuous speech separation with conformer

S Chen, Y Wu, Z Chen, J Wu, J Li… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Continuous speech separation was recently proposed to deal with the overlapped speech in
natural conversations. While it was shown to significantly improve the speech recognition …

Guardar Citar Citado por 151 Artículos relacionados Las 5 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings

A review of speaker diarization: Recent advances with deep learning

Robust speech recognition via large-scale weak supervision

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks

Robust self-supervised audio-visual speech recognition

Speechstew: Simply mix all available speech recognition data to train one large neural network

The third DIHARD diarization challenge

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario

Continuous speech separation with conformer