A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Fairseq S2T: Fast speech-to-text modeling with fairseq
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such
as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful …
as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful …
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …
between any two languages? While recent breakthroughs in text-based models have …
Seamless: Multilingual Expressive and Streaming Speech Translation
Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …
mediated communication feel seamless when compared to human-to-human dialogue. In …
Findings of the IWSLT 2022 Evaluation Campaign.
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …
Recent advances in direct speech-to-text translation
Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
Multilingual speech translation with efficient finetuning of pretrained models
We present a simple yet effective approach to build multilingual speech-to-text (ST)
translation by efficient transfer learning from pretrained speech encoder and text decoder …
translation by efficient transfer learning from pretrained speech encoder and text decoder …
The multilingual tedx corpus for speech recognition and translation
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and
speech translation (ST) research across many non-English source languages. The corpus is …
speech translation (ST) research across many non-English source languages. The corpus is …
Cross-modal contrastive learning for speech translation
How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …
Learning similar representations for semantically similar speech and text is important for …
Unity: Two-pass direct speech-to-speech translation with discrete units
Direct speech-to-speech translation (S2ST), in which all components can be optimized
jointly, is advantageous over cascaded approaches to achieve fast inference with a …
jointly, is advantageous over cascaded approaches to achieve fast inference with a …