A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Fairseq S2T: Fast speech-to-text modeling with fairseq

C Wang, Y Tang, X Ma, A Wu, S Popuri… - arxiv preprint arxiv …, 2020 - arxiv.org
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such
as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful …

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

Seamless: Multilingual Expressive and Streaming Speech Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org
Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

Findings of the IWSLT 2022 Evaluation Campaign.

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022 - cris.fbk.eu
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

Recent advances in direct speech-to-text translation

C Xu, R Ye, Q Dong, C Zhao, T Ko, M Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …

Multilingual speech translation with efficient finetuning of pretrained models

X Li, C Wang, Y Tang, C Tran, Y Tang, J Pino… - arxiv preprint arxiv …, 2020 - arxiv.org
We present a simple yet effective approach to build multilingual speech-to-text (ST)
translation by efficient transfer learning from pretrained speech encoder and text decoder …

The multilingual tedx corpus for speech recognition and translation

E Salesky, M Wiesner, J Bremerman, R Cattoni… - arxiv preprint arxiv …, 2021 - arxiv.org
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and
speech translation (ST) research across many non-English source languages. The corpus is …

Cross-modal contrastive learning for speech translation

R Ye, M Wang, L Li - arxiv preprint arxiv:2205.02444, 2022 - arxiv.org
How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …

Unity: Two-pass direct speech-to-speech translation with discrete units

H Inaguma, S Popuri, I Kulikov, PJ Chen… - arxiv preprint arxiv …, 2022 - arxiv.org
Direct speech-to-speech translation (S2ST), in which all components can be optimized
jointly, is advantageous over cascaded approaches to achieve fast inference with a …