[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Intermediate loss regularization for ctc-based speech recognition

J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …

Attention-inspired artificial neural networks for speech processing: A systematic review

N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …

From senones to chenones: Tied context-dependent graphemes for hybrid speech recognition

D Le, X Zhang, W Zheng, C Fügen… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
There is an implicit assumption that traditional hybrid approaches for automatic speech
recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to …

Improving ctc-based speech recognition via knowledge transferring from pre-trained language models

K Deng, S Cao, Y Zhang, L Ma, G Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Recently, end-to-end automatic speech recognition models based on connectionist temporal
classification (CTC) have achieved impressive results, especially when fine-tuned from …

[PDF][PDF] Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.

S Li, R Dabre, X Lu, P Shen, T Kawahara, H Kawai - Interspeech, 2019 - isca-archive.org
The end-to-end (E2E) model allows for training of automatic speech recognition (ASR)
systems without having to consider the acoustic model, lexicon, language model and …

Spoken dialogue system for a human-like conversational robot ERICA

T Kawahara - 9th International Workshop on Spoken Dialogue …, 2019 - Springer
This article gives an overview of our symbiotic human-robot interaction project, which aims
at an autonomous android who behaves and interacts just like a human. A conversational …

Leveraging sequence-to-sequence speech synthesis for enhancing acoustic-to-word speech recognition

M Mimura, S Ueno, H Inaguma, S Sakai… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Encoder-decoder models for acoustic-to-word (A2W) automatic speech recognition (ASR)
are attractive for their simplicity of architecture and run-time latency while achieving state-of …

Multiresolution and multimodal speech recognition with transformers

G Paraskevopoulos, S Parthasarathy, A Khare… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents an audio visual automatic speech recognition (AV-ASR) system using a
Transformer-based architecture. We particularly focus on the scene context provided by the …

4D ASR: Joint modeling of CTC, attention, transducer, and mask-predict decoders

Y Sudo, M Shakeel, B Yan, J Shi… - arxiv preprint arxiv …, 2022 - arxiv.org
The network architecture of end-to-end (E2E) automatic speech recognition (ASR) can be
classified into several models, including connectionist temporal classification (CTC) …