[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Automatic speech recognition using advanced deep learning approaches: A survey
Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
Findings of the IWSLT 2022 Evaluation Campaign.
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …
Dual-mode ASR: Unify and improve streaming ASR with full-context modeling
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as
quickly and accurately as possible, while full-context ASR waits for the completion of a full …
quickly and accurately as possible, while full-context ASR waits for the completion of a full …
Exploring deep transfer learning techniques for Alzheimer's dementia detection
Examination of speech datasets for detecting dementia, collected via various speech tasks,
has revealed links between speech and cognitive abilities. However, the speech dataset …
has revealed links between speech and cognitive abilities. However, the speech dataset …
Transformer-transducers for code-switched speech recognition
We live in a world where 60% of the population can speak two or more languages fluently.
Members of these communities constantly switch between languages when having a …
Members of these communities constantly switch between languages when having a …
RealTranS: End-to-end simultaneous speech translation with convolutional weighted-shrinking transformer
End-to-end simultaneous speech translation (SST), which directly translates speech in one
language into text in another language in real-time, is useful in many scenarios but has not …
language into text in another language in real-time, is useful in many scenarios but has not …
Spiral: Self-supervised perturbation-invariant representation learning for speech pre-training
We introduce a new approach for speech pre-training named SPIRAL which works by
learning denoising representation of perturbed data in a teacher-student framework …
learning denoising representation of perturbed data in a teacher-student framework …
Token-level serialized output training for joint streaming asr and st leveraging textual alignments
In real-world applications, users often require both translations and transcriptions of speech
to enhance their comprehension, particularly in streaming scenarios where incremental …
to enhance their comprehension, particularly in streaming scenarios where incremental …
Uconv-conformer: High reduction of input sequence length for end-to-end speech recognition
Optimization of modern ASR architectures is among the highest priority tasks since it saves
many computational resources for model training and inference. The work proposes a new …
many computational resources for model training and inference. The work proposes a new …