[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

A review on big data based on deep neural network approaches

M Rithani, RP Kumar, S Doss - Artificial Intelligence Review, 2023 - Springer
Big data analytics has become a significant trend for many businesses as a result of the
daily acquisition of enormous volumes of data. This information has been gathered because …

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

The multilingual tedx corpus for speech recognition and translation

E Salesky, M Wiesner, J Bremerman, R Cattoni… - arxiv preprint arxiv …, 2021 - arxiv.org
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and
speech translation (ST) research across many non-English source languages. The corpus is …

ESPnet-ST: All-in-one speech translation toolkit

H Inaguma, S Kiyono, K Duh, S Karita… - arxiv preprint arxiv …, 2020 - arxiv.org
We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …

Cascade versus direct speech translation: Do the differences still make a difference?

L Bentivogli, M Cettolo, M Gaido, A Karakanta… - arxiv preprint arxiv …, 2021 - arxiv.org
Five years after the first published proofs of concept, direct approaches to speech translation
(ST) are now competing with traditional cascade solutions. In light of this steady progress …

Improving speech translation by understanding and learning from the auxiliary text translation task

Y Tang, J Pino, X Li, C Wang, D Genzel - arxiv preprint arxiv:2107.05782, 2021 - arxiv.org
Pretraining and multitask learning are widely used to improve the speech to text translation
performance. In this study, we are interested in training a speech to text translation model …

Learning shared semantic space for speech-to-text translation

C Han, M Wang, H Ji, L Li - arxiv preprint arxiv:2105.03095, 2021 - arxiv.org
Having numerous potential applications and great impact, end-to-end speech translation
(ST) has long been treated as an independent task, failing to fully draw strength from the …

Revisiting end-to-end speech-to-text translation from scratch

B Zhang, B Haddow… - … conference on machine …, 2022 - proceedings.mlr.press
Abstract End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its
encoder and/or decoder using source transcripts via speech recognition or text translation …

Speech translation and the end-to-end promise: Taking stock of where we are

M Sperber, M Paulik - arxiv preprint arxiv:2004.06358, 2020 - arxiv.org
Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …