Recent advances in direct speech-to-text translation

C Xu, R Ye, Q Dong, C Zhao, T Ko, M Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …

STEMM: Self-learning with speech-text manifold mixup for speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arxiv preprint arxiv:2203.10426, 2022 - arxiv.org
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

ESPnet-ST: All-in-one speech translation toolkit

H Inaguma, S Kiyono, K Duh, S Karita… - arxiv preprint arxiv …, 2020 - arxiv.org
We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …

End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

Cascade versus direct speech translation: Do the differences still make a difference?

L Bentivogli, M Cettolo, M Gaido, A Karakanta… - arxiv preprint arxiv …, 2021 - arxiv.org
Five years after the first published proofs of concept, direct approaches to speech translation
(ST) are now competing with traditional cascade solutions. In light of this steady progress …

Learning shared semantic space for speech-to-text translation

C Han, M Wang, H Ji, L Li - arxiv preprint arxiv:2105.03095, 2021 - arxiv.org
Having numerous potential applications and great impact, end-to-end speech translation
(ST) has long been treated as an independent task, failing to fully draw strength from the …

Speech translation and the end-to-end promise: Taking stock of where we are

M Sperber, M Paulik - arxiv preprint arxiv:2004.06358, 2020 - arxiv.org
Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …

Curriculum pre-training for end-to-end speech translation

C Wang, Y Wu, S Liu, M Zhou, Z Yang - arxiv preprint arxiv:2004.10093, 2020 - arxiv.org
End-to-end speech translation poses a heavy burden on the encoder, because it has to
transcribe, understand, and learn cross-lingual semantics simultaneously. To obtain a …

Multimodal machine translation through visuals and speech

U Sulubacak, O Caglayan, SA Grönroos, A Rouhe… - Machine …, 2020 - Springer
Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …

Comsl: A composite speech-language model for end-to-end speech-to-text translation

C Le, Y Qian, L Zhou, S Liu, Y Qian… - Advances in Neural …, 2023 - proceedings.neurips.cc
Joint speech-language training is challenging due to the large demand for training data and
GPU consumption, as well as the modality gap between speech and language. We present …