Recent advances in direct speech-to-text translation
Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
STEMM: Self-learning with speech-text manifold mixup for speech translation
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
ESPnet-ST: All-in-one speech translation toolkit
We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …
End-to-end speech-to-text translation: A survey
N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …
one language to text in another language. It finds its application in various domains, such as …
Cascade versus direct speech translation: Do the differences still make a difference?
Five years after the first published proofs of concept, direct approaches to speech translation
(ST) are now competing with traditional cascade solutions. In light of this steady progress …
(ST) are now competing with traditional cascade solutions. In light of this steady progress …
Learning shared semantic space for speech-to-text translation
Having numerous potential applications and great impact, end-to-end speech translation
(ST) has long been treated as an independent task, failing to fully draw strength from the …
(ST) has long been treated as an independent task, failing to fully draw strength from the …
Speech translation and the end-to-end promise: Taking stock of where we are
M Sperber, M Paulik - arxiv preprint arxiv:2004.06358, 2020 - arxiv.org
Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …
primary research themes; moving from loosely coupled cascades of speech recognition and …
Curriculum pre-training for end-to-end speech translation
End-to-end speech translation poses a heavy burden on the encoder, because it has to
transcribe, understand, and learn cross-lingual semantics simultaneously. To obtain a …
transcribe, understand, and learn cross-lingual semantics simultaneously. To obtain a …
Multimodal machine translation through visuals and speech
Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …
based on the assumption that the additional modalities will contain useful alternative views …
Comsl: A composite speech-language model for end-to-end speech-to-text translation
Joint speech-language training is challenging due to the large demand for training data and
GPU consumption, as well as the modality gap between speech and language. We present …
GPU consumption, as well as the modality gap between speech and language. We present …