- Academic Search

Z Zhang, L Zhou, C Wang, S Chen, Y Wu, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual
speech synthesis. Specifically, we extend VALL-E and train a multi-lingual conditional codec …

Tallenna Viittaa Viittausten määrä 164 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reproducing whisper-style training using an open-source toolkit and publicly available data

Y Peng, J Tian, B Yan, D Berrebbi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Pre-training speech models on large volumes of data has achieved remarkable success.
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …

Tallenna Viittaa Viittausten määrä 46 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier

Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

Tallenna Viittaa Viittausten määrä 6 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

M³ST: Mix at Three Levels for Speech Translation

X Cheng, Q Dong, F Yue, T Ko… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's
well known that data augmentation is an efficient method to improve performance for many …

Tallenna Viittaa Viittausten määrä 54 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota

[Free GPT-4]
[DeepSeek]

[PDF] um.edu.mt

Findings of the IWSLT 2023 evaluation campaign

M Agarwal, S Agarwal, A Anastasopoulos, L Bentivogli… - 2023 - um.edu.mt

This paper reports on the shared tasks organized by the 20th IWSLT Conference. The
shared tasks address 9 scientific challenges in spoken language translation: simultaneous …

Tallenna Viittaa Viittausten määrä 52 Aiheeseen liittyviä artikkeleita Kaikki 10 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speech translation with large language models: An industrial practice

Z Huang, R Ye, T Ko, Q Dong, S Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org

Given the great success of large language models (LLMs) across various tasks, in this
paper, we introduce LLM-ST, a novel and effective speech translation model constructed …

Tallenna Viittaa Viittausten määrä 12 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vec-tok speech: speech vectorization and tokenization for neural speech generation

X Zhu, Y Lv, Y Lei, T Li, W He, H Zhou, H Lu… - arxiv preprint arxiv …, 2023 - arxiv.org

Language models (LMs) have recently flourished in natural language processing and
computer vision, generating high-fidelity texts or images in various tasks. In contrast, the …

Tallenna Viittaa Viittausten määrä 13 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the effects of heterogeneous data sources on speech-to-text foundation models

J Tian, Y Peng, W Chen, K Choi, K Livescu… - arxiv preprint arxiv …, 2024 - arxiv.org

The Open Whisper-style Speech Model (OWSM) series was introduced to achieve full
transparency in building advanced speech-to-text (S2T) foundation models. To this end …

Tallenna Viittaa Viittausten määrä 5 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] LAMASSU: A streaming language-agnostic multilingual speech recognition and translation model using neural transducers

P Wang, E Sun, J Xue, Y Wu, L Zhou, Y Gaur… - Proc …, 2023 - isca-archive.org

Automatic speech recognition (ASR) and speech translation (ST) can both use neural
transducers as the model structure. It is thus possible to use a single transducer model to …

Tallenna Viittaa Viittausten määrä 14 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models

D Wang, M Cui, D Yang, X Chen, H Meng - arxiv preprint arxiv …, 2024 - arxiv.org

With the rise of Speech Large Language Models (Speech LLMs), there has been growing
interest in discrete speech tokens for their ability to integrate with text-based tokens …

Tallenna Viittaa Viittausten määrä 2 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Gigast: A 10,000-hour pseudo speech translation corpus

Speak foreign languages with your own voice: Cross-lingual neural codec language modeling

Reproducing whisper-style training using an open-source toolkit and publicly available data

End-to-end speech-to-text translation: A survey

M³ST: Mix at Three Levels for Speech Translation

Findings of the IWSLT 2023 evaluation campaign

Speech translation with large language models: An industrial practice

Vec-tok speech: speech vectorization and tokenization for neural speech generation

On the effects of heterogeneous data sources on speech-to-text foundation models

[PDF][PDF] LAMASSU: A streaming language-agnostic multilingual speech recognition and translation model using neural transducers

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models