Prompting the hidden talent of web-scale speech models for zero-shot task generalization
We investigate the emergent abilities of the recently proposed web-scale speech model
Whisper, by adapting it to unseen tasks with prompt engineering. We selected three tasks …
Whisper, by adapting it to unseen tasks with prompt engineering. We selected three tasks …
T-modules: Translation modules for zero-shot cross-modal machine translation
We present a new approach to perform zero-shot cross-modal transfer between speech and
text for translation tasks. Multilingual speech and text are encoded in a joint fixed-size …
text for translation tasks. Multilingual speech and text are encoded in a joint fixed-size …
Modular speech-to-text translation for zero-shot cross-modal transfer
Recent research has shown that independently trained encoders and decoders, combined
through a shared fixed-size representation, can achieve competitive performance in speech …
through a shared fixed-size representation, can achieve competitive performance in speech …
Evaluating parameter-efficient transfer learning approaches on sure benchmark for speech understanding
Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained
models. Parameter inefficiency can however arise when, during transfer learning, all the …
models. Parameter inefficiency can however arise when, during transfer learning, all the …
End-to-end speech translation with pre-trained models and adapters: Upc at iwslt 2021
This paper describes the submission to the IWSLT 2021 offline speech translation task by
the UPC Machine Translation group. The task consists of building a system capable of …
the UPC Machine Translation group. The task consists of building a system capable of …
Multimodal robustness for neural machine translation
In this paper, we look at the case of a Generic text-to-text NMT model that has to deal with
data coming from various modalities, like speech, images, or noisy text extracted from the …
data coming from various modalities, like speech, images, or noisy text extracted from the …
Discrete cross-modal alignment enables zero-shot speech translation
End-to-end Speech Translation (ST) aims at translating the source language speech into
target language text without generating the intermediate transcriptions. However, the …
target language text without generating the intermediate transcriptions. However, the …
An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation
Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST
(Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the …
(Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the …
Towards Zero-shot Learning for End-to-end Cross-modal Translation Models
One of the main problems in speech translation is the mismatches between different
modalities. The second problem, scarcity of parallel data covering multiple modalities …
modalities. The second problem, scarcity of parallel data covering multiple modalities …
Learning multilingual and multimodal representations with language-specific encoders and decoders for machine translation
C Escolano Peinado - 2022 - upcommons.upc.edu
This thesis aims to study different language-specific approaches for Multilingual Machine
Translation without parameter sharing and their properties compared to the current state-of …
Translation without parameter sharing and their properties compared to the current state-of …