- Academic Search

Specmaskgit: Masked generative modeling of audio spectrograms for efficient audio synthesis and beyond

M Comunità, Z Zhong, A Takahashi, S Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advances in generative models that iteratively synthesize audio clips sparked great
success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy …

บันทึก อ้างอิง อ้างโดย3 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiobox tta-rag: Improving zero-shot and few-shot text-to-audio with retrieval-augmented generation

M Yang, B Shi, M Le, WN Hsu, A Tjandra - arxiv preprint arxiv:2411.05141, 2024 - arxiv.org

Current leading Text-To-Audio (TTA) generation models suffer from degraded performance
on zero-shot and few-shot settings. It is often challenging to generate high-quality audio for …

บันทึก อ้างอิง อ้างโดย2 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

H Liu, J Wang, R Huang, Y Liu, H Lu, W Xue… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in latent diffusion models (LDMs) have markedly enhanced text-to-
audio generation, yet their iterative sampling processes impose substantial computational …

บันทึก อ้างอิง อ้างโดย1 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation

J Im, J Nam - arxiv preprint arxiv:2501.10807, 2025 - arxiv.org

Versatile audio super-resolution (SR) is the challenging task of restoring high-frequency
components from low-resolution audio with sampling rates between 4kHz and 32kHz in …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] upm.es

[PDF][PDF] Generative and parametric models for interactive neural synthesis in speech and audio

MJC Largo - 2024 - oa.upm.es

Speech synthesis is a multifaceted process that encompasses both acoustic signals and
articulatory dynamics. Traditional neural audio synthesis methods often rely exclusively on …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Soundctm: Uniting score-based and consistency models for text-to-sound generation

Specmaskgit: Masked generative modeling of audio spectrograms for efficient audio synthesis and beyond

Audiobox tta-rag: Improving zero-shot and few-shot text-to-audio with retrieval-augmented generation

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation

[PDF][PDF] Generative and parametric models for interactive neural synthesis in speech and audio