A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
A comprehensive review of multimodal large language models: Performance and challenges across different tasks
In an era defined by the explosive growth of data and rapid technological advancements,
Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence …
Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence …
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …
between any two languages? While recent breakthroughs in text-based models have …
Direct speech-to-speech translation with discrete units
We present a direct speech-to-speech translation (S2ST) model that translates speech from
one language to speech in another language without relying on intermediate text …
one language to speech in another language without relying on intermediate text …
STEMM: Self-learning with speech-text manifold mixup for speech translation
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
ESPnet-ST: All-in-one speech translation toolkit
We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …
Multilingual speech translation with efficient finetuning of pretrained models
We present a simple yet effective approach to build multilingual speech-to-text (ST)
translation by efficient transfer learning from pretrained speech encoder and text decoder …
translation by efficient transfer learning from pretrained speech encoder and text decoder …
Enhanced direct speech-to-speech translation using self-supervised pre-training and data augmentation
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there
exists little parallel S2ST data, compared to the amount of data available for conventional …
exists little parallel S2ST data, compared to the amount of data available for conventional …
Cascade versus direct speech translation: Do the differences still make a difference?
Five years after the first published proofs of concept, direct approaches to speech translation
(ST) are now competing with traditional cascade solutions. In light of this steady progress …
(ST) are now competing with traditional cascade solutions. In light of this steady progress …
Learning shared semantic space for speech-to-text translation
Having numerous potential applications and great impact, end-to-end speech translation
(ST) has long been treated as an independent task, failing to fully draw strength from the …
(ST) has long been treated as an independent task, failing to fully draw strength from the …