- Academic Search

Mustango: Toward controllable text-to-music generation

J Melechovsky, Z Guo, D Ghosal, N Majumder… - arxiv preprint arxiv …, 2023 - arxiv.org

With recent advancements in text-to-audio and text-to-music based on latent diffusion
models, the quality of generated content has been reaching new heights. The controllability …

Save Cite Cited by 48 Related articles All 4 versions Free GPT-4 View as HTML

Loop copilot: Conducting ai ensembles for music generation and iterative editing

Y Zhang, A Maezawa, G **a, K Yamamoto… - arxiv preprint arxiv …, 2023 - arxiv.org

Creating music is iterative, requiring varied methods at each stage. However, existing AI
music systems fall short in orchestrating multiple subsystems for diverse needs. To address …

Save Cite Cited by 17 Related articles All 3 versions Free GPT-4 View as HTML

Tiva: Time-aligned video-to-audio generation

X Wang, Y Wang, Y Wu, R Song, X Tan… - Proceedings of the …, 2024 - dl.acm.org

Video-to-audio generation is crucial for autonomous video editing and post-processing,
which aims to generate high-quality audio for silent videos with semantic similarity and …

Save Cite Cited by 5 Related articles All 3 versions Free GPT-4

VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation

R Huang, Y Wang, R Hu, X Xu, Z Hong… - Proceedings of the …, 2024 - dl.acm.org

Voice large language models (LLMs) cast voice synthesis as a language modeling task in a
discrete space, and have demonstrated significant progress to date. Despite the recent …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4

AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps

H Liu, R Huang, Y Liu, H Cao, J Wang… - Proceedings of the …, 2024 - dl.acm.org

Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the
forefront of various generative tasks. However, their iterative sampling process poses a …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4

Tango 2: Aligning diffusion-based text-to-audio generative models through direct preference optimization

N Majumder, CY Hung, D Ghosal, WN Hsu… - ACM Multimedia …, 2024 - openreview.net

Generative multimodal content is increasingly prevalent in much of the content creation
arena, as it has the potential to allow artists and media personnel to create pre-production …

Save Cite Cited by 4 Related articles View as HTML

Dance-to-music generation with encoder-based textual inversion of diffusion models

S Li, W Dong, Y Zhang, F Tang, C Ma… - arxiv preprint arxiv …, 2024 - arxiv.org

The harmonious integration of music with dance movements is pivotal in vividly conveying
the artistic essence of dance. This alignment also significantly elevates the immersive quality …

Save Cite Cited by 7 Related articles All 2 versions Free GPT-4 View as HTML

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

H Liu, J Wang, R Huang, Y Liu, H Lu, W Xue… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in latent diffusion models (LDMs) have markedly enhanced text-to-
audio generation, yet their iterative sampling processes impose substantial computational …

Save Cite Cited by 1 Related articles View as HTML