[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

Q Shao, P Guo, J Yan, P Hu… - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Accents pose significant challenges for speech recognition systems. Although joint
automatic speech recognition (ASR) and accent recognition (AR) training has been proven …

Non-autoregressive error correction for CTC-based ASR with phone-conditioned masked LM

H Futami, H Inaguma, S Ueno, M Mimura… - arxiv preprint arxiv …, 2022 - arxiv.org
Connectionist temporal classification (CTC)-based models are attractive in automatic
speech recognition (ASR) because of their non-autoregressive nature. To take advantage of …

Improving deliberation by text-only and semi-supervised training

K Hu, TN Sainath, Y He, R Prabhavalkar… - arxiv preprint arxiv …, 2022 - arxiv.org
Text-only and semi-supervised training based on audio-only data has gained popularity
recently due to the wide availability of unlabeled text and speech data. In this work, we …