„Google“ mokslinčius

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Išsaugoti Cituoti Cituoja 242 Susiję straipsniai Visos 7 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arxiv preprint arxiv …, 2023 - arxiv.org

As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

Išsaugoti Cituoti Cituoja 210 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2023 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

Išsaugoti Cituoti Cituoja 268 Susiję straipsniai Visos 9 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

Išsaugoti Cituoti Cituoja 268 Susiję straipsniai Visos 5 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

Išsaugoti Cituoti Cituoja 655 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Symphonize 3d semantic scene completion with contextual instance queries

H Jiang, T Cheng, N Gao, H Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract 3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal
undertaking in autonomous driving aiming to predict the voxel occupancy within volumetric …

Išsaugoti Cituoti Cituoja 214 Susiję straipsniai Visos 10 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Speak, read and prompt: High-fidelity text-to-speech with minimal supervision

E Kharitonov, D Vincent, Z Borsos… - Transactions of the …, 2023 - direct.mit.edu

We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained
with minimal supervision. By combining two types of discrete speech representations, we …

Išsaugoti Cituoti Cituoja 190 Susiję straipsniai Visos 5 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Išsaugoti Cituoti Cituoja 234 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Z Ju, Y Wang, K Shen, X Tan, D **n, D Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

While recent large-scale text-to-speech (TTS) models have achieved significant progress,
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …

Išsaugoti Cituoti Cituoja 146 Susiję straipsniai Visos 8 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Mofusion: A framework for denoising-diffusion-based motion synthesis

R Dabral, MH Mughal, V Golyanik… - Proceedings of the …, 2023 - openaccess.thecvf.com

Conventional methods for human motion synthesis have either been deterministic or have
had to struggle with the trade-off between motion diversity vs motion quality. In response to …

Išsaugoti Cituoti Cituoja 154 Susiję straipsniai Visos 11 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

A review of deep learning techniques for speech processing

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

Voicebox: Text-guided multilingual universal speech generation at scale

High-fidelity audio compression with improved rvqgan

Neural codec language models are zero-shot text to speech synthesizers

Symphonize 3d semantic scene completion with contextual instance queries

Speak, read and prompt: High-fidelity text-to-speech with minimal supervision

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Mofusion: A framework for denoising-diffusion-based motion synthesis