[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press
We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

Unsupervised cross-lingual representation learning for speech recognition

A Conneau, A Baevski, R Collobert… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents XLSR which learns cross-lingual speech representations by pretraining
a single model from the raw waveform of speech in multiple languages. We build on …

Automatic speech recognition: a survey

M Malik, MK Malik, K Mehmood… - Multimedia Tools and …, 2021 - Springer
Recently great strides have been made in the field of automatic speech recognition (ASR) by
using various deep learning techniques. In this study, we present a thorough comparison …

Improving continuous sign language recognition with cross-lingual signs

F Wei, Y Chen - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
This work dedicates to continuous sign language recognition (CSLR), which is a weakly
supervised task dealing with the recognition of continuous signs from videos, without any …

Massively multilingual ASR: 50 languages, 1 model, 1 billion parameters

V Pratap, A Sriram, P Tomasello, A Hannun… - arxiv preprint arxiv …, 2020 - arxiv.org
We study training a single acoustic model for multiple languages with the aim of improving
automatic speech recognition (ASR) performance on low-resource languages, and over-all …

Large-scale multilingual speech recognition with a streaming end-to-end model

A Kannan, A Datta, TN Sainath, E Weinstein… - arxiv preprint arxiv …, 2019 - arxiv.org
Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic
speech recognition (ASR) coverage of the world's languages. They have shown …

Lingvo: a modular and scalable framework for sequence-to-sequence modeling

J Shen, P Nguyen, Y Wu, Z Chen, MX Chen… - arxiv preprint arxiv …, 2019 - arxiv.org
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep
learning research, with a particular focus towards sequence-to-sequence models. Lingvo …