- Academic Search

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Enregistrer Citer Cité 235 fois Autres articles Les 6 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation

C Wang, M Riviere, A Lee, A Wu, C Talnikar… - arxiv preprint arxiv …, 2021 - arxiv.org

We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …

Enregistrer Citer Cité 499 fois Autres articles Les 9 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fairseq S2T: Fast speech-to-text modeling with fairseq

C Wang, Y Tang, X Ma, A Wu, S Popuri… - arxiv preprint arxiv …, 2020 - arxiv.org

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such
as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful …

Enregistrer Citer Cité 275 fois Autres articles Les 5 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

Enregistrer Citer Cité 108 fois Autres articles Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

STEMM: Self-learning with speech-text manifold mixup for speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arxiv preprint arxiv:2203.10426, 2022 - arxiv.org

How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

Enregistrer Citer Cité 100 fois Autres articles Les 8 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recent advances in direct speech-to-text translation

C Xu, R Ye, Q Dong, C Zhao, T Ko, M Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …

Enregistrer Citer Cité 20 fois Autres articles Les 4 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Seamless: Multilingual Expressive and Streaming Speech Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

Enregistrer Citer Cité 104 fois Autres articles Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unified speech-text pre-training for speech translation and recognition

Y Tang, H Gong, N Dong, C Wang, WN Hsu… - arxiv preprint arxiv …, 2022 - arxiv.org

We describe a method to jointly pre-train speech and text in an encoder-decoder modeling
framework for speech translation and recognition. The proposed method incorporates four …

Enregistrer Citer Cité 83 fois Autres articles Les 6 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multilingual speech translation with efficient finetuning of pretrained models

X Li, C Wang, Y Tang, C Tran, Y Tang, J Pino… - arxiv preprint arxiv …, 2020 - arxiv.org

We present a simple yet effective approach to build multilingual speech-to-text (ST)
translation by efficient transfer learning from pretrained speech encoder and text decoder …

Enregistrer Citer Cité 144 fois Autres articles Les 6 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross-modal contrastive learning for speech translation

R Ye, M Wang, L Li - arxiv preprint arxiv:2205.02444, 2022 - arxiv.org

How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …

Enregistrer Citer Cité 85 fois Autres articles Les 9 versions Free GPT-4 DeepSeek Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Self-training for end-to-end speech translation

A review of deep learning techniques for speech processing

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation

Fairseq S2T: Fast speech-to-text modeling with fairseq

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

STEMM: Self-learning with speech-text manifold mixup for speech translation

Recent advances in direct speech-to-text translation

Seamless: Multilingual Expressive and Streaming Speech Translation

Unified speech-text pre-training for speech translation and recognition

Multilingual speech translation with efficient finetuning of pretrained models

Cross-modal contrastive learning for speech translation