Google Академик

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

Сачувај Цитирај 110 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

Сачувај Цитирај 115 пута наведен Сродни чланци HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation

C Wang, M Riviere, A Lee, A Wu, C Talnikar… - arxiv preprint arxiv …, 2021 - arxiv.org

We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …

Сачувај Цитирај 507 пута наведен Сродни чланци Све верзије (10) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Direct speech-to-speech translation with discrete units

A Lee, PJ Chen, C Wang, J Gu, S Popuri, X Ma… - arxiv preprint arxiv …, 2021 - arxiv.org

We present a direct speech-to-speech translation (S2ST) model that translates speech from
one language to speech in another language without relying on intermediate text …

Сачувај Цитирај 172 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CVSS corpus and massively multilingual speech-to-speech translation

Y Jia, MT Ramanovich, Q Wang, H Zen - arxiv preprint arxiv:2201.03713, 2022 - arxiv.org

We introduce CVSS, a massively multilingual-to-English speech-to-speech translation
(S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English …

Сачувај Цитирај 79 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unity: Two-pass direct speech-to-speech translation with discrete units

H Inaguma, S Popuri, I Kulikov, PJ Chen… - arxiv preprint arxiv …, 2022 - arxiv.org

Direct speech-to-speech translation (S2ST), in which all components can be optimized
jointly, is advantageous over cascaded approaches to achieve fast inference with a …

Сачувај Цитирај 46 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enhanced direct speech-to-speech translation using self-supervised pre-training and data augmentation

S Popuri, PJ Chen, C Wang, J Pino, Y Adi, J Gu… - arxiv preprint arxiv …, 2022 - arxiv.org

Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there
exists little parallel S2ST data, compared to the amount of data available for conventional …

Сачувај Цитирај 66 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speech translation and the end-to-end promise: Taking stock of where we are

M Sperber, M Paulik - arxiv preprint arxiv:2004.06358, 2020 - arxiv.org

Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …

Сачувај Цитирај 111 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Text-free image-to-speech synthesis using learned segmental units

WN Hsu, D Harwath, C Song, J Glass - arxiv preprint arxiv:2012.15454, 2020 - arxiv.org

In this paper we present the first model for directly synthesizing fluent, natural-sounding
spoken audio captions for images that does not require natural language text as an …

Сачувај Цитирај 83 пута наведен Сродни чланци Све верзије (10) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

nnaudio: An on-the-fly gpu audio to spectrogram conversion toolbox using 1d convolutional neural networks

KW Cheuk, H Anderson, K Agres, D Herremans - IEEE Access, 2020 - ieeexplore.ieee.org

In this paper, we present nnAudio, a new neural network-based audio processing framework
with graphics processing unit (GPU) support that leverages 1D convolutional neural …

Сачувај Цитирај 107 пута наведен Сродни чланци Све верзије (6)

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Speech-to-speech translation between untranscribed unknown languages

Seamless: Multilingual Expressive and Streaming Speech Translation

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation

Direct speech-to-speech translation with discrete units

CVSS corpus and massively multilingual speech-to-speech translation

Unity: Two-pass direct speech-to-speech translation with discrete units

Enhanced direct speech-to-speech translation using self-supervised pre-training and data augmentation

Speech translation and the end-to-end promise: Taking stock of where we are

Text-free image-to-speech synthesis using learned segmental units

nnaudio: An on-the-fly gpu audio to spectrogram conversion toolbox using 1d convolutional neural networks