[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
XLS-R: Self-supervised cross-lingual speech representation learning at scale
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …
Unsupervised cross-lingual representation learning for speech recognition
This paper presents XLSR which learns cross-lingual speech representations by pretraining
a single model from the raw waveform of speech in multiple languages. We build on …
a single model from the raw waveform of speech in multiple languages. We build on …
Applying wav2vec2. 0 to speech recognition in various low-resource languages
C Yi, J Wang, N Cheng, S Zhou, B Xu - arxiv preprint arxiv:2012.12121, 2020 - arxiv.org
There are several domains that own corresponding widely used feature extractors, such as
ResNet, BERT, and GPT-x. These models are usually pre-trained on large amounts of …
ResNet, BERT, and GPT-x. These models are usually pre-trained on large amounts of …
Automatic speech recognition for Uyghur, Kazakh, and Kyrgyz: An overview
With the emergence of deep learning, the performance of automatic speech recognition
(ASR) systems has remarkably improved. Especially for resource-rich languages such as …
(ASR) systems has remarkably improved. Especially for resource-rich languages such as …
Multilingual end-to-end speech translation
In this paper, we propose a simple yet effective framework for multilingual end-to-end
speech translation (ST), in which speech utterances in source languages are directly …
speech translation (ST), in which speech utterances in source languages are directly …
Leveraging modality-specific representations for audio-visual speech recognition via reinforcement learning
Audio-visual speech recognition (AVSR) has gained remarkable success for ameliorating
the noise-robustness of speech recognition. Mainstream methods focus on fusing audio and …
the noise-robustness of speech recognition. Mainstream methods focus on fusing audio and …
Massively multilingual adversarial speech recognition
We report on adaptation of multilingual end-to-end speech recognition models trained on as
many as 100 languages. Our findings shed light on the relative importance of similarity …
many as 100 languages. Our findings shed light on the relative importance of similarity …
Hierarchical transfer learning for multilingual, multi-speaker, and style transfer DNN-based TTS on low-resource languages
This work applies a hierarchical transfer learning to implement deep neural network (DNN)-
based multilingual text-to-speech (TTS) for low-resource languages. DNN-based system …
based multilingual text-to-speech (TTS) for low-resource languages. DNN-based system …
Xtreme-s: Evaluating cross-lingual speech representations
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech
representations in many languages. XTREME-S covers four task families: speech …
representations in many languages. XTREME-S covers four task families: speech …