- Academic Search

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

บันทึก อ้างอิง อ้างโดย229 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

บันทึก อ้างอิง อ้างโดย12 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Matcha-TTS: A fast TTS architecture with conditional flow matching

S Mehta, R Tu, J Beskow, É Székely… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic
modelling, trained using optimal-transport conditional flow matching (OT-CFM). This yields …

บันทึก อ้างอิง อ้างโดย70 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hierspeech++: Bridging the gap between semantic and acoustic representation of speech by hierarchical variational inference for zero-shot speech synthesis

SH Lee, HY Choi, SB Kim, SW Lee - arxiv preprint arxiv:2311.12454, 2023 - arxiv.org

Large language models (LLM)-based speech synthesis has been widely adopted in zero-
shot speech synthesis. However, they require a large-scale data and possess the same …

บันทึก อ้างอิง อ้างโดย29 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Voiceflow: Efficient text-to-speech with rectified flow matching

Y Guo, C Du, Z Ma, X Chen, K Yu - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Although diffusion models in text-to-speech have become a popular choice due to their
strong generative ability, the intrinsic complexity of sampling from diffusion models harms …

บันทึก อ้างอิง อ้างโดย34 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Flashspeech: Efficient zero-shot speech synthesis

Z Ye, Z Ju, H Liu, X Tan, J Chen, Y Lu, P Sun… - Proceedings of the …, 2024 - dl.acm.org

Recent progress in large-scale zero-shot speech synthesis has been significantly advanced
by language models and diffusion models. However, the generation process of both …

บันทึก อ้างอิง อ้างโดย12 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Schrodinger bridges beat diffusion models on text-to-speech synthesis

Z Chen, G He, K Zheng, X Tan, J Zhu - arxiv preprint arxiv:2312.03491, 2023 - arxiv.org

In text-to-speech (TTS) synthesis, diffusion models have achieved promising generation
quality. However, because of the pre-defined data-to-noise diffusion process, their prior …

บันทึก อ้างอิง อ้างโดย21 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Autoregressive diffusion transformer for text-to-speech synthesis

Z Liu, S Wang, S Inoue, Q Bai, H Li - arxiv preprint arxiv:2406.05551, 2024 - arxiv.org

Audio language models have recently emerged as a promising approach for various audio
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …

บันทึก อ้างอิง อ้างโดย14 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiolcm: Text-to-audio generation with latent consistency models

H Liu, R Huang, Y Liu, H Cao, J Wang, X Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the
forefront of various generative tasks. However, their iterative sampling process poses a …

บันทึก อ้างอิง อ้างโดย9 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reflow-tts: A rectified flow model for high-fidelity text-to-speech

W Guan, Q Su, H Zhou, S Miao, X **e… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

The diffusion models including Denoising Diffusion Probabilistic Models (DDPM) and score-
based generative models have demonstrated excellent performance in speech synthesis …

บันทึก อ้างอิง อ้างโดย14 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Comospeech: One-step speech and singing voice synthesis via consistency model

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Foundation models for music: A survey

Matcha-TTS: A fast TTS architecture with conditional flow matching

Hierspeech++: Bridging the gap between semantic and acoustic representation of speech by hierarchical variational inference for zero-shot speech synthesis

Voiceflow: Efficient text-to-speech with rectified flow matching

Flashspeech: Efficient zero-shot speech synthesis

Schrodinger bridges beat diffusion models on text-to-speech synthesis

Autoregressive diffusion transformer for text-to-speech synthesis

Audiolcm: Text-to-audio generation with latent consistency models

Reflow-tts: A rectified flow model for high-fidelity text-to-speech