Unity: Two-pass direct speech-to-speech translation with discrete units

H Inaguma, S Popuri, I Kulikov, PJ Chen… - arxiv preprint arxiv …, 2022 - arxiv.org
Direct speech-to-speech translation (S2ST), in which all components can be optimized
jointly, is advantageous over cascaded approaches to achieve fast inference with a …

End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

[PDF][PDF] Prompting the hidden talent of web-scale speech models for zero-shot task generalization

P Peng, B Yan - 2023 - par.nsf.gov
We investigate the emergent abilities of the recently proposed web-scale speech model
Whisper, by adapting it to unseen tasks with prompt engineering. We selected three tasks …

Translatotron 3: Speech to speech translation with monolingual data

E Nachmani, A Levkovitch, Y Ding… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-
speech translation from monolingual speech-text datasets by combining masked …

Joint pre-training with speech and bilingual text for direct speech to speech translation

K Wei, L Zhou, Z Zhang, L Chen, S Liu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Direct speech-to-speech translation (S2ST) is an attractive research topic with many
advantages compared to cascaded S2ST. However, direct S2ST suffers from the data …

Direct Speech-to-Speech Neural Machine Translation: A Survey

M Gupta, M Dutta, CK Maurya - arxiv preprint arxiv:2411.14453, 2024 - arxiv.org
Speech-to-Speech Translation (S2ST) models transform speech from one language to
another target language with the same linguistic information. S2ST is important for bridging …

Improving cascaded unsupervised speech translation with denoising back-translation

YK Fu, LH Tseng, J Shi, CA Li, TY Hsu… - arxiv preprint arxiv …, 2023 - arxiv.org
Most of the speech translation models heavily rely on parallel data, which is hard to collect
especially for low-resource languages. To tackle this issue, we propose to build a cascaded …

Kazakh-Uzbek speech cascade machine translation on complete set of endings

T Balabekova, B Kairatuly, U Tukeyev - International Conference on …, 2023 - Springer
Studies of speech-to-speech machine translation for Turkic languages are practically absent
due to the difficulties of creating parallel speech corpora for training neural models …

SimulTron: On-Device Simultaneous Speech to Speech Translation

A Agranovich, E Nachmani, O Rybakov, Y Ding… - arxiv preprint arxiv …, 2024 - arxiv.org
Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down
communication barriers and enabling fluid conversations across languages. However …

Transformer-Based End-to-End Speech Translation With Rotary Position Embedding

X Li, S Li, XL Zhang, S Rahardja - IEEE Signal Processing …, 2024 - ieeexplore.ieee.org
Recently, many Transformer-based models have been applied to end-to-end speech
translation because of their capability to model global dependencies. Position embedding is …