STEMM: Self-learning with speech-text manifold mixup for speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arxiv preprint arxiv:2203.10426, 2022 - arxiv.org
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

Recent advances in direct speech-to-text translation

C Xu, R Ye, Q Dong, C Zhao, T Ko, M Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …

Findings of the IWSLT 2022 Evaluation Campaign.

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022 - cris.fbk.eu
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

Cross-modal contrastive learning for speech translation

R Ye, M Wang, L Li - arxiv preprint arxiv:2205.02444, 2022 - arxiv.org
How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …

Lightweight adapter tuning for multilingual speech translation

H Le, J Pino, C Wang, J Gu, D Schwab… - arxiv preprint arxiv …, 2021 - arxiv.org
Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP.
Adapter tuning consists in freezing pretrained parameters of a model and injecting …

End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

Learning shared semantic space for speech-to-text translation

C Han, M Wang, H Ji, L Li - arxiv preprint arxiv:2105.03095, 2021 - arxiv.org
Having numerous potential applications and great impact, end-to-end speech translation
(ST) has long been treated as an independent task, failing to fully draw strength from the …

End-to-end speech translation via cross-modal progressive training

R Ye, M Wang, L Li - arxiv preprint arxiv:2104.10380, 2021 - arxiv.org
End-to-end speech translation models have become a new trend in research due to their
potential of reducing error propagation. However, these models still suffer from the …

Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation

R Zheng, J Chen, M Ma… - … Conference on Machine …, 2021 - proceedings.mlr.press
Recently, representation learning for text and speech has successfully improved many
language related tasks. However, all existing methods suffer from two limitations:(a) they …

CMOT: Cross-modal mixup via optimal transport for speech translation

Y Zhou, Q Fang, Y Feng - arxiv preprint arxiv:2305.14635, 2023 - arxiv.org
End-to-end speech translation (ST) is the task of translating speech signals in the source
language into text in the target language. As a cross-modal task, end-to-end ST is difficult to …