Direct speech-to-speech translation with discrete units

A Lee, PJ Chen, C Wang, J Gu, S Popuri, X Ma… - arxiv preprint arxiv …, 2021 - arxiv.org
We present a direct speech-to-speech translation (S2ST) model that translates speech from
one language to speech in another language without relying on intermediate text …

Seamless: Multilingual Expressive and Streaming Speech Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org
Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

Direct speech-to-speech translation with a sequence-to-sequence model

Y Jia, RJ Weiss, F Biadsy, W Macherey… - arxiv preprint arxiv …, 2019 - arxiv.org
We present an attention-based sequence-to-sequence neural network which can directly
translate speech from one language into speech in another language, without relying on an …

Enhanced speech-to-speech translation system and methods for adding a new word

A Waibel, IR Lane - US Patent 8,972,268, 2015 - Google Patents
(57) ABSTRACT A speech translation system and methods for cross-lingual communication
that enable users to improve and modify con tent and usage of the system and easily abort …

Speech translation and the end-to-end promise: Taking stock of where we are

M Sperber, M Paulik - arxiv preprint arxiv:2004.06358, 2020 - arxiv.org
Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …

A holistic cascade system, benchmark, and human evaluation protocol for expressive speech-to-speech translation

WC Huang, B Peloquin, J Kao, C Wang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of
source speech to target speech while maintaining translation accuracy. Existing research in …

End-to-end speech translation with transcoding by multi-task learning for distant language pairs

T Kano, S Sakti, S Nakamura - IEEE/ACM Transactions on …, 2020 - ieeexplore.ieee.org
Directly translating spoken utterances from a source language to a target language is
challenging because it requires a fundamental transformation in both linguistic and para/non …

Structured-based curriculum learning for end-to-end english-japanese speech translation

T Kano, S Sakti, S Nakamura - arxiv preprint arxiv:1802.06003, 2018 - arxiv.org
Sequence-to-sequence attentional-based neural network architectures have been shown to
provide a powerful model for machine translation and speech recognition. Recently, several …

Enhancing speech-to-speech translation with multiple tts targets

J Shi, Y Tang, A Lee, H Inaguma… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
It has been known that direct speech-to-speech translation (S2ST) models usually suffer
from the data scarcity issue because of the limited existing parallel materials for both source …

Controlling prosody in end-to-end TTS: A case study on contrastive focus generation

S Latif, I Kim, I Calapodescu… - Proceedings of the 25th …, 2021 - aclanthology.org
Abstract While End-2-End Text-to-Speech (TTS) has made significant progresses over the
past few years, these systems still lack intuitive user controls over prosody. For instance …