A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …
Fairseq S2T: Fast speech-to-text modeling with fairseq
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such
as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful …
as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful …
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …
between any two languages? While recent breakthroughs in text-based models have …
STEMM: Self-learning with speech-text manifold mixup for speech translation
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
Recent advances in direct speech-to-text translation
Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
Seamless: Multilingual Expressive and Streaming Speech Translation
Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …
mediated communication feel seamless when compared to human-to-human dialogue. In …
Unified speech-text pre-training for speech translation and recognition
We describe a method to jointly pre-train speech and text in an encoder-decoder modeling
framework for speech translation and recognition. The proposed method incorporates four …
framework for speech translation and recognition. The proposed method incorporates four …
Multilingual speech translation with efficient finetuning of pretrained models
We present a simple yet effective approach to build multilingual speech-to-text (ST)
translation by efficient transfer learning from pretrained speech encoder and text decoder …
translation by efficient transfer learning from pretrained speech encoder and text decoder …
Cross-modal contrastive learning for speech translation
How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …
Learning similar representations for semantically similar speech and text is important for …