[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Intermediate loss regularization for ctc-based speech recognition
We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …
Attention-inspired artificial neural networks for speech processing: A systematic review
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …
human brain and have been widely applied in speech processing. The application areas of …
From senones to chenones: Tied context-dependent graphemes for hybrid speech recognition
There is an implicit assumption that traditional hybrid approaches for automatic speech
recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to …
recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to …
Improving ctc-based speech recognition via knowledge transferring from pre-trained language models
Recently, end-to-end automatic speech recognition models based on connectionist temporal
classification (CTC) have achieved impressive results, especially when fine-tuned from …
classification (CTC) have achieved impressive results, especially when fine-tuned from …
[PDF][PDF] Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.
The end-to-end (E2E) model allows for training of automatic speech recognition (ASR)
systems without having to consider the acoustic model, lexicon, language model and …
systems without having to consider the acoustic model, lexicon, language model and …
Spoken dialogue system for a human-like conversational robot ERICA
T Kawahara - 9th International Workshop on Spoken Dialogue …, 2019 - Springer
This article gives an overview of our symbiotic human-robot interaction project, which aims
at an autonomous android who behaves and interacts just like a human. A conversational …
at an autonomous android who behaves and interacts just like a human. A conversational …
Leveraging sequence-to-sequence speech synthesis for enhancing acoustic-to-word speech recognition
Encoder-decoder models for acoustic-to-word (A2W) automatic speech recognition (ASR)
are attractive for their simplicity of architecture and run-time latency while achieving state-of …
are attractive for their simplicity of architecture and run-time latency while achieving state-of …
Multiresolution and multimodal speech recognition with transformers
This paper presents an audio visual automatic speech recognition (AV-ASR) system using a
Transformer-based architecture. We particularly focus on the scene context provided by the …
Transformer-based architecture. We particularly focus on the scene context provided by the …
4D ASR: Joint modeling of CTC, attention, transducer, and mask-predict decoders
The network architecture of end-to-end (E2E) automatic speech recognition (ASR) can be
classified into several models, including connectionist temporal classification (CTC) …
classified into several models, including connectionist temporal classification (CTC) …