[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Adaptation algorithms for neural network-based speech recognition: An overview
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …
recognition, considering both hybrid hidden Markov model/neural network systems and end …
Scaling speech technology to 1,000+ languages
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …
access to information for many more people. However, current speech technology is …
Robust speech recognition via large-scale weak supervision
We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …
Unsupervised cross-lingual representation learning for speech recognition
This paper presents XLSR which learns cross-lingual speech representations by pretraining
a single model from the raw waveform of speech in multiple languages. We build on …
a single model from the raw waveform of speech in multiple languages. We build on …
Automatic speech recognition: a survey
Recently great strides have been made in the field of automatic speech recognition (ASR) by
using various deep learning techniques. In this study, we present a thorough comparison …
using various deep learning techniques. In this study, we present a thorough comparison …
Improving continuous sign language recognition with cross-lingual signs
This work dedicates to continuous sign language recognition (CSLR), which is a weakly
supervised task dealing with the recognition of continuous signs from videos, without any …
supervised task dealing with the recognition of continuous signs from videos, without any …
Massively multilingual ASR: 50 languages, 1 model, 1 billion parameters
We study training a single acoustic model for multiple languages with the aim of improving
automatic speech recognition (ASR) performance on low-resource languages, and over-all …
automatic speech recognition (ASR) performance on low-resource languages, and over-all …
Large-scale multilingual speech recognition with a streaming end-to-end model
Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic
speech recognition (ASR) coverage of the world's languages. They have shown …
speech recognition (ASR) coverage of the world's languages. They have shown …
Lingvo: a modular and scalable framework for sequence-to-sequence modeling
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep
learning research, with a particular focus towards sequence-to-sequence models. Lingvo …
learning research, with a particular focus towards sequence-to-sequence models. Lingvo …