- Academic Search

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Opslaan Citeren Geciteerd door 419 Verwante artikelen Alle 9 versies

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

Deepfakes generation and detection: a short survey

Z Akhtar - Journal of Imaging, 2023 - mdpi.com

Advancements in deep learning techniques and the availability of free, large databases
have made it possible, even for non-technical people, to either manipulate or generate …

Opslaan Citeren Geciteerd door 87 Verwante artikelen Alle 6 versies In cache

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

R Huang, J Huang, D Yang, Y Ren… - International …, 2023 - proceedings.mlr.press

Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …

Opslaan Citeren Geciteerd door 328 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2023 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

Opslaan Citeren Geciteerd door 259 Verwante artikelen Alle 9 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

Opslaan Citeren Geciteerd door 643 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Speak, read and prompt: High-fidelity text-to-speech with minimal supervision

E Kharitonov, D Vincent, Z Borsos… - Transactions of the …, 2023 - direct.mit.edu

We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained
with minimal supervision. By combining two types of discrete speech representations, we …

Opslaan Citeren Geciteerd door 185 Verwante artikelen Alle 5 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Opslaan Citeren Geciteerd door 226 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Styletts 2: Towards human-level text-to-speech through style diffusion and adversarial training with large speech language models

YA Li, C Han, V Raghavan… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style
diffusion and adversarial training with large speech language models (SLMs) to achieve …

Opslaan Citeren Geciteerd door 104 Verwante artikelen Alle 8 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …

Opslaan Citeren Geciteerd door 447 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bigvgan: A universal neural vocoder with large-scale training

S Lee, W **, B Ginsburg, B Catanzaro… - arxiv preprint arxiv …, 2022 - arxiv.org

Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …

Opslaan Citeren Geciteerd door 244 Verwante artikelen Alle 6 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Libritts: A corpus derived from librispeech for text-to-speech

An overview of voice conversion and its challenges: From statistical modeling to deep learning

Deepfakes generation and detection: a short survey

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

Voicebox: Text-guided multilingual universal speech generation at scale

Neural codec language models are zero-shot text to speech synthesizers

Speak, read and prompt: High-fidelity text-to-speech with minimal supervision

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Styletts 2: Towards human-level text-to-speech through style diffusion and adversarial training with large speech language models

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

Bigvgan: A universal neural vocoder with large-scale training