- Academic Search

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Tallenna Viittaa Viittausten määrä 242 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press

We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

Tallenna Viittaa Viittausten määrä 3952 Aiheeseen liittyviä artikkeleita Kaikki 11 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing

J Ao, R Wang, L Zhou, C Wang, S Ren, Y Wu… - arxiv preprint arxiv …, 2021 - arxiv.org

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural
language processing models, we propose a unified-modal SpeechT5 framework that …

Tallenna Viittaa Viittausten määrä 253 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation

C Wang, M Riviere, A Lee, A Wu, C Talnikar… - arxiv preprint arxiv …, 2021 - arxiv.org

We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …

Tallenna Viittaa Viittausten määrä 505 Aiheeseen liittyviä artikkeleita Kaikki 10 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Direct speech-to-speech translation with discrete units

A Lee, PJ Chen, C Wang, J Gu, S Popuri, X Ma… - arxiv preprint arxiv …, 2021 - arxiv.org

We present a direct speech-to-speech translation (S2ST) model that translates speech from
one language to speech in another language without relying on intermediate text …

Tallenna Viittaa Viittausten määrä 172 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arxiv preprint arxiv …, 2023 - arxiv.org

The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Tallenna Viittaa Viittausten määrä 71 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] CoVoST 2 and massively multilingual speech translation.

C Wang, A Wu, J Gu, J Pino - Interspeech, 2021 - isca-archive.org

Speech translation (ST) is an increasingly popular topic of research, partly due to the
development of benchmark datasets. Nevertheless, current datasets cover a limited number …

Tallenna Viittaa Viittausten määrä 139 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

STEMM: Self-learning with speech-text manifold mixup for speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arxiv preprint arxiv:2203.10426, 2022 - arxiv.org

How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

Tallenna Viittaa Viittausten määrä 102 Aiheeseen liittyviä artikkeleita Kaikki 9 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

[PDF][PDF] Speech emotion recognition with multi-task learning.

X Cai, J Yuan, R Zheng, L Huang, K Church - Interspeech, 2021 - academia.edu

Speech emotion recognition (SER) classifies speech into emotion categories such as:
Happy, Angry, Sad and Neutral. Recently, deep learning has been applied to the SER task …

Tallenna Viittaa Viittausten määrä 121 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross-modal contrastive learning for speech translation

R Ye, M Wang, L Li - arxiv preprint arxiv:2205.02444, 2022 - arxiv.org

How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …

Tallenna Viittaa Viittausten määrä 89 Aiheeseen liittyviä artikkeleita Kaikki 9 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Fairseq S2T: Fast speech-to-text modeling with fairseq

A review of deep learning techniques for speech processing

Robust speech recognition via large-scale weak supervision

Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation

Direct speech-to-speech translation with discrete units

Transformers in speech processing: A survey

[PDF][PDF] CoVoST 2 and massively multilingual speech translation.

STEMM: Self-learning with speech-text manifold mixup for speech translation

[PDF][PDF] Speech emotion recognition with multi-task learning.

Cross-modal contrastive learning for speech translation