Direct speech-to-speech translation with discrete units
We present a direct speech-to-speech translation (S2ST) model that translates speech from
one language to speech in another language without relying on intermediate text …
one language to speech in another language without relying on intermediate text …
Seamless: Multilingual Expressive and Streaming Speech Translation
Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …
mediated communication feel seamless when compared to human-to-human dialogue. In …
Direct speech-to-speech translation with a sequence-to-sequence model
We present an attention-based sequence-to-sequence neural network which can directly
translate speech from one language into speech in another language, without relying on an …
translate speech from one language into speech in another language, without relying on an …
Enhanced speech-to-speech translation system and methods for adding a new word
(57) ABSTRACT A speech translation system and methods for cross-lingual communication
that enable users to improve and modify con tent and usage of the system and easily abort …
that enable users to improve and modify con tent and usage of the system and easily abort …
Speech translation and the end-to-end promise: Taking stock of where we are
M Sperber, M Paulik - arxiv preprint arxiv:2004.06358, 2020 - arxiv.org
Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …
primary research themes; moving from loosely coupled cascades of speech recognition and …
A holistic cascade system, benchmark, and human evaluation protocol for expressive speech-to-speech translation
Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of
source speech to target speech while maintaining translation accuracy. Existing research in …
source speech to target speech while maintaining translation accuracy. Existing research in …
End-to-end speech translation with transcoding by multi-task learning for distant language pairs
Directly translating spoken utterances from a source language to a target language is
challenging because it requires a fundamental transformation in both linguistic and para/non …
challenging because it requires a fundamental transformation in both linguistic and para/non …
Structured-based curriculum learning for end-to-end english-japanese speech translation
Sequence-to-sequence attentional-based neural network architectures have been shown to
provide a powerful model for machine translation and speech recognition. Recently, several …
provide a powerful model for machine translation and speech recognition. Recently, several …
Enhancing speech-to-speech translation with multiple tts targets
It has been known that direct speech-to-speech translation (S2ST) models usually suffer
from the data scarcity issue because of the limited existing parallel materials for both source …
from the data scarcity issue because of the limited existing parallel materials for both source …
Controlling prosody in end-to-end TTS: A case study on contrastive focus generation
Abstract While End-2-End Text-to-Speech (TTS) has made significant progresses over the
past few years, these systems still lack intuitive user controls over prosody. For instance …
past few years, these systems still lack intuitive user controls over prosody. For instance …