[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Automatic speech recognition using advanced deep learning approaches: A survey

H Kheddar, M Hemis, Y Himeur - Information Fusion, 2024 - Elsevier
Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …

Findings of the IWSLT 2022 Evaluation Campaign.

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022 - cris.fbk.eu
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

Dual-mode ASR: Unify and improve streaming ASR with full-context modeling

J Yu, W Han, A Gulati, CC Chiu, B Li… - arxiv preprint arxiv …, 2020 - arxiv.org
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as
quickly and accurately as possible, while full-context ASR waits for the completion of a full …

Exploring deep transfer learning techniques for Alzheimer's dementia detection

Y Zhu, X Liang, JA Batsis, RM Roth - Frontiers in computer science, 2021 - frontiersin.org
Examination of speech datasets for detecting dementia, collected via various speech tasks,
has revealed links between speech and cognitive abilities. However, the speech dataset …

Transformer-transducers for code-switched speech recognition

S Dalmia, Y Liu, S Ronanki… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We live in a world where 60% of the population can speak two or more languages fluently.
Members of these communities constantly switch between languages when having a …

RealTranS: End-to-end simultaneous speech translation with convolutional weighted-shrinking transformer

X Zeng, L Li, Q Liu - arxiv preprint arxiv:2106.04833, 2021 - arxiv.org
End-to-end simultaneous speech translation (SST), which directly translates speech in one
language into text in another language in real-time, is useful in many scenarios but has not …

Spiral: Self-supervised perturbation-invariant representation learning for speech pre-training

W Huang, Z Zhang, YT Yeung, X Jiang… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce a new approach for speech pre-training named SPIRAL which works by
learning denoising representation of perturbed data in a teacher-student framework …

Token-level serialized output training for joint streaming asr and st leveraging textual alignments

S Papi, P Wang, J Chen, J Xue, J Li… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In real-world applications, users often require both translations and transcriptions of speech
to enhance their comprehension, particularly in streaming scenarios where incremental …

Uconv-conformer: High reduction of input sequence length for end-to-end speech recognition

A Andrusenko, R Nasretdinov… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Optimization of modern ASR architectures is among the highest priority tasks since it saves
many computational resources for model training and inference. The work proposes a new …