[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Google usm: Scaling automatic speech recognition beyond 100 languages
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …
models pre-trained using large, diverse unlabeled datasets containing approximately a …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Contextual adapters for personalized speech recognition in neural transducers
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR)
models is a challenge due to the lack of training data. A standard way to address this issue …
models is a challenge due to the lack of training data. A standard way to address this issue …
Scaling end-to-end models for large-scale multilingual asr
Building ASR models across many languages is a challenging multi-task learning problem
due to large variations and heavily unbalanced data. Existing work has shown positive …
due to large variations and heavily unbalanced data. Existing work has shown positive …
Joist: A joint speech and text streaming model for asr
We present JOIST, an algorithm to train a streaming, cascaded, encoder end-to-end (E2E)
model with both speech-text paired inputs, and text-only unpaired inputs. Unlike previous …
model with both speech-text paired inputs, and text-only unpaired inputs. Unlike previous …
Tied & reduced rnn-t decoder
Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown
that, under some conditions, it is possible to simplify its prediction network with little or no …
that, under some conditions, it is possible to simplify its prediction network with little or no …
Massively multilingual asr: A lifelong learning solution
The development of end-to-end models has largely sped up the research in massively
multilingual automatic speech recognition (MMASR). Previous research has demonstrated …
multilingual automatic speech recognition (MMASR). Previous research has demonstrated …