[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Advancing RNN transducer technology for speech recognition

G Saon, Z Tüske, D Bolanos… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
We investigate a set of techniques for RNN Transducers (RNN-Ts) that were instrumental in
lowering the word error rate on three different tasks (Switchboard 300 hours, conversational …

Prompting large language models for zero-shot domain adaptation in speech recognition

Y Li, Y Wu, J Li, S Liu - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
The integration of Language Models (LMs) has proven to be an effective way to address
domain shifts in speech recognition. However, these approaches usually require a …

Internal language model estimation for domain-adaptive end-to-end speech recognition

Z Meng, S Parthasarathy, E Sun, Y Gaur… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
The external language models (LM) integration remains a challenging task for end-to-end
(E2E) automatic speech recognition (ASR) which has no clear division between acoustic …

Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

D Le, M Jain, G Keren, S Kim, Y Shi… - arxiv preprint arxiv …, 2021 - arxiv.org
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …

Joist: A joint speech and text streaming model for asr

TN Sainath, R Prabhavalkar, A Bapna… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
We present JOIST, an algorithm to train a streaming, cascaded, encoder end-to-end (E2E)
model with both speech-text paired inputs, and text-only unpaired inputs. Unlike previous …

Synthasr: Unlocking synthetic data for speech recognition

A Fazel, W Yang, Y Liu, R Barra-Chicote… - arxiv preprint arxiv …, 2021 - arxiv.org
End-to-end (E2E) automatic speech recognition (ASR) models have recently demonstrated
superior performance over the traditional hybrid ASR models. Training an E2E ASR model …

Tied & reduced RNN-T decoder

R Botros, TN Sainath, R David, E Guzman, W Li… - arxiv preprint arxiv …, 2021 - arxiv.org
Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown
that, under some conditions, it is possible to simplify its prediction network with little or no …