Transformers in speech processing: A survey
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …
sparked the interest of the speech-processing community, leading to an exploration of their …
A transformer-based model with self-distillation for multimodal emotion recognition in conversations
Emotion recognition in conversations (ERC), the task of recognizing the emotion of each
utterance in a conversation, is crucial for building empathetic machines. Existing studies …
utterance in a conversation, is crucial for building empathetic machines. Existing studies …
[PDF][PDF] End-to-end japanese multi-dialect speech recognition and dialect identification with multi-task learning
End-to-end systems have demonstrated state-of-the-art performance on many tasks related
to automatic speech recognition (ASR) and dialect identification (DID). In this paper, we …
to automatic speech recognition (ASR) and dialect identification (DID). In this paper, we …
Self-Distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach
Text recognition methods are gaining rapid development. Some advanced techniques, eg,
powerful modules, language models, and un-and semi-supervised learning schemes …
powerful modules, language models, and un-and semi-supervised learning schemes …
Multi-encoder learning and stream fusion for transformer-based end-to-end automatic speech recognition
Stream fusion, also known as system combination, is a common technique in automatic
speech recognition for traditional hybrid hidden Markov model approaches, yet mostly …
speech recognition for traditional hybrid hidden Markov model approaches, yet mostly …
Layer pruning on demand with intermediate CTC
Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded
devices is a challenging task, since the device computational power and energy …
devices is a challenging task, since the device computational power and energy …
Alignment knowledge distillation for online streaming attention-based speech recognition
This article describes an efficient training method for online streaming attention-based
encoder-decoder (AED) automatic speech recognition (ASR) systems. AED models have …
encoder-decoder (AED) automatic speech recognition (ASR) systems. AED models have …
Relaxed attention: A simple method to boost performance of end-to-end automatic speech recognition
Recently, attention-based encoder-decoder (AED) models have shown high performance for
end-to-end automatic speech recognition (ASR) across several tasks. Addressing …
end-to-end automatic speech recognition (ASR) across several tasks. Addressing …
A comparative study on neural architectures and training methods for Japanese speech recognition
End-to-end (E2E) modeling is advantageous for automatic speech recognition (ASR)
especially for Japanese since word-based tokenization of Japanese is not trivial, and E2E …
especially for Japanese since word-based tokenization of Japanese is not trivial, and E2E …
Distilling the Knowledge of BERT for CTC-based ASR
Connectionist temporal classification (CTC)-based models are attractive because of their
fast inference in automatic speech recognition (ASR). Language model (LM) integration …
fast inference in automatic speech recognition (ASR). Language model (LM) integration …