Prompting the hidden talent of web-scale speech models for zero-shot task generalization

P Peng, B Yan, S Watanabe, D Harwath - arxiv preprint arxiv:2305.11095, 2023 - arxiv.org
We investigate the emergent abilities of the recently proposed web-scale speech model
Whisper, by adapting it to unseen tasks with prompt engineering. We selected three tasks …

T-modules: Translation modules for zero-shot cross-modal machine translation

PA Duquenne, H Gong, B Sagot, H Schwenk - arxiv preprint arxiv …, 2022 - arxiv.org
We present a new approach to perform zero-shot cross-modal transfer between speech and
text for translation tasks. Multilingual speech and text are encoded in a joint fixed-size …

Modular speech-to-text translation for zero-shot cross-modal transfer

PA Duquenne, H Schwenk, B Sagot - arxiv preprint arxiv:2310.03724, 2023 - arxiv.org
Recent research has shown that independently trained encoders and decoders, combined
through a shared fixed-size representation, can achieve competitive performance in speech …

Evaluating parameter-efficient transfer learning approaches on sure benchmark for speech understanding

Y Li, A Mehrish, R Bhardwaj… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained
models. Parameter inefficiency can however arise when, during transfer learning, all the …

End-to-end speech translation with pre-trained models and adapters: Upc at iwslt 2021

GI Gállego, I Tsiamas, C Escolano… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper describes the submission to the IWSLT 2021 offline speech translation task by
the UPC Machine Translation group. The task consists of building a system capable of …

Multimodal robustness for neural machine translation

Y Zhao, I Calapodescu - Proceedings of the 2022 conference on …, 2022 - aclanthology.org
In this paper, we look at the case of a Generic text-to-text NMT model that has to deal with
data coming from various modalities, like speech, images, or noisy text extracted from the …

Discrete cross-modal alignment enables zero-shot speech translation

C Wang, Y Liu, B Chen, J Zhang, W Luo… - arxiv preprint arxiv …, 2022 - arxiv.org
End-to-end Speech Translation (ST) aims at translating the source language speech into
target language text without generating the intermediate transcriptions. However, the …

An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation

P Gao, R Zhang, Z He, H Wu, H Wang - arxiv preprint arxiv:2308.14482, 2023 - arxiv.org
Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST
(Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the …

Towards Zero-shot Learning for End-to-end Cross-modal Translation Models

J Yang, K Fan, M Liao, B Chen… - Findings of the …, 2023 - aclanthology.org
One of the main problems in speech translation is the mismatches between different
modalities. The second problem, scarcity of parallel data covering multiple modalities …

Learning multilingual and multimodal representations with language-specific encoders and decoders for machine translation

C Escolano Peinado - 2022 - upcommons.upc.edu
This thesis aims to study different language-specific approaches for Multilingual Machine
Translation without parameter sharing and their properties compared to the current state-of …