- Academic Search

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Enregistrer Citer Cité 229 fois Autres articles Les 6 versions Free GPT-4

[Free GPT-4]

[PDF] springer.com

Large-scale multi-modal pre-trained models: A comprehensive survey

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer

With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

Enregistrer Citer Cité 192 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Audiolm: a language modeling approach to audio generation

Z Borsos, R Marinier, D Vincent… - … ACM transactions on …, 2023 - ieeexplore.ieee.org

We introduce AudioLM, a framework for high-quality audio generation with long-term
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …

Enregistrer Citer Cité 589 fois Autres articles Les 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

Enregistrer Citer Cité 297 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] mit.edu

Speak, read and prompt: High-fidelity text-to-speech with minimal supervision

E Kharitonov, D Vincent, Z Borsos… - Transactions of the …, 2023 - direct.mit.edu

We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained
with minimal supervision. By combining two types of discrete speech representations, we …

Enregistrer Citer Cité 180 fois Autres articles Les 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Enregistrer Citer Cité 1834 fois Autres articles Les 5 versions Free GPT-4

[Free GPT-4]

[PDF] nowpublishers.com

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Enregistrer Citer Cité 440 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] dtu.dk

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Enregistrer Citer Cité 405 fois Autres articles Les 10 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Fleurs: Few-shot learning evaluation of universal representations of speech

A Conneau, M Ma, S Khanuja, Y Zhang… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of
Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on …

Enregistrer Citer Cité 281 fois Autres articles Les 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org

The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Enregistrer Citer Cité 154 fois Autres articles Les 4 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

W2v-bert: Combining contrastive learning and masked language modeling for self-supervised...

A review of deep learning techniques for speech processing

Large-scale multi-modal pre-trained models: A comprehensive survey

Audiolm: a language modeling approach to audio generation

Google usm: Scaling automatic speech recognition beyond 100 languages

Speak, read and prompt: High-fidelity text-to-speech with minimal supervision

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

Self-supervised speech representation learning: A review

Fleurs: Few-shot learning evaluation of universal representations of speech

Self-supervised learning for videos: A survey