Μελετητής Google

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 35 Σχετικά άρθρα Όλες οι 4 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org

Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 298 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 637 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[HTML] nih.gov

A high-performance neuroprosthesis for speech decoding and avatar control

SL Metzger, KT Littlejohn, AB Silva, DA Moses… - Nature, 2023 - nature.com

Speech neuroprostheses have the potential to restore communication to people living with
paralysis, but naturalistic speed and expressivity are elusive. Here we use high-density …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 261 Σχετικά άρθρα Όλες οι 9 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiolm: a language modeling approach to audio generation

Z Borsos, R Marinier, D Vincent… - … ACM transactions on …, 2023 - ieeexplore.ieee.org

We introduce AudioLM, a framework for high-quality audio generation with long-term
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 602 Σχετικά άρθρα Όλες οι 5 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 255 Σχετικά άρθρα Όλες οι 8 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Speak, read and prompt: High-fidelity text-to-speech with minimal supervision

E Kharitonov, D Vincent, Z Borsos… - Transactions of the …, 2023 - direct.mit.edu

We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained
with minimal supervision. By combining two types of discrete speech representations, we …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 183 Σχετικά άρθρα Όλες οι 5 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 217 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language models (LLM) have demonstrated the capability to handle a variety of
generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 106 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Asvspoof 2021: Towards spoofed and deepfake speech detection in the wild

X Liu, X Wang, M Sahidullah, J Patino… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Benchmarking initiatives support the meaningful comparison of competing solutions to
prominent problems in speech and language processing. Successive benchmarking …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 191 Σχετικά άρθρα Όλες οι 6 εκδοχές

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

Sparks of large audio models: A survey and outlook

Scaling speech technology to 1,000+ languages

Neural codec language models are zero-shot text to speech synthesizers

A high-performance neuroprosthesis for speech decoding and avatar control

Audiolm: a language modeling approach to audio generation

Voicebox: Text-guided multilingual universal speech generation at scale

Speak, read and prompt: High-fidelity text-to-speech with minimal supervision

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Uniaudio: An audio foundation model toward universal audio generation

Asvspoof 2021: Towards spoofed and deepfake speech detection in the wild