- Academic Search

C Xu, R Ye, Q Dong, C Zhao, T Ko, M Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …

Save Cite Cited by 23 Related articles All 4 versions View as HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

STEMM: Self-learning with speech-text manifold mixup for speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arxiv preprint arxiv:2203.10426, 2022 - arxiv.org

How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

Save Cite Cited by 103 Related articles All 9 versions View as HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

ESPnet-ST: All-in-one speech translation toolkit

H Inaguma, S Kiyono, K Duh, S Karita… - arxiv preprint arxiv …, 2020 - arxiv.org

We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …

Save Cite Cited by 179 Related articles All 6 versions View as HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier

Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

Save Cite Cited by 7 Related articles All 4 versions

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Cascade versus direct speech translation: Do the differences still make a difference?

L Bentivogli, M Cettolo, M Gaido, A Karakanta… - arxiv preprint arxiv …, 2021 - arxiv.org

Five years after the first published proofs of concept, direct approaches to speech translation
(ST) are now competing with traditional cascade solutions. In light of this steady progress …

Save Cite Cited by 84 Related articles All 11 versions View as HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Learning shared semantic space for speech-to-text translation

C Han, M Wang, H Ji, L Li - arxiv preprint arxiv:2105.03095, 2021 - arxiv.org

Having numerous potential applications and great impact, end-to-end speech translation
(ST) has long been treated as an independent task, failing to fully draw strength from the …

Save Cite Cited by 82 Related articles All 7 versions View as HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Speech translation and the end-to-end promise: Taking stock of where we are

M Sperber, M Paulik - arxiv preprint arxiv:2004.06358, 2020 - arxiv.org

Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …

Save Cite Cited by 112 Related articles All 6 versions View as HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Curriculum pre-training for end-to-end speech translation

C Wang, Y Wu, S Liu, M Zhou, Z Yang - arxiv preprint arxiv:2004.10093, 2020 - arxiv.org

End-to-end speech translation poses a heavy burden on the encoder, because it has to
transcribe, understand, and learn cross-lingual semantics simultaneously. To obtain a …

Save Cite Cited by 110 Related articles All 4 versions View as HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] springer.com

Multimodal machine translation through visuals and speech

U Sulubacak, O Caglayan, SA Grönroos, A Rouhe… - Machine …, 2020 - Springer

Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …

Save Cite Cited by 89 Related articles All 19 versions

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Comsl: A composite speech-language model for end-to-end speech-to-text translation

C Le, Y Qian, L Zhou, S Liu, Y Qian… - Advances in Neural …, 2023 - proceedings.neurips.cc

Joint speech-language training is challenging due to the large demand for training data and
GPU consumption, as well as the modality gap between speech and language. We present …

Save Cite Cited by 11 Related articles All 6 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

On using specaugment for end-to-end speech translation

Recent advances in direct speech-to-text translation

STEMM: Self-learning with speech-text manifold mixup for speech translation

ESPnet-ST: All-in-one speech translation toolkit

End-to-end speech-to-text translation: A survey

Cascade versus direct speech translation: Do the differences still make a difference?

Learning shared semantic space for speech-to-text translation

Speech translation and the end-to-end promise: Taking stock of where we are

Curriculum pre-training for end-to-end speech translation

Multimodal machine translation through visuals and speech

Comsl: A composite speech-language model for end-to-end speech-to-text translation