Tango 2: Aligning diffusion-based text-to-audio generations through direct preference optimization
Generative multimodal content is increasingly prevalent in much of the content creation
arena, as it has the potential to allow artists and media personnel to create pre-production …
arena, as it has the potential to allow artists and media personnel to create pre-production …
Mustango: Toward controllable text-to-music generation
With recent advancements in text-to-audio and text-to-music based on latent diffusion
models, the quality of generated content has been reaching new heights. The controllability …
models, the quality of generated content has been reaching new heights. The controllability …
Loop copilot: Conducting ai ensembles for music generation and iterative editing
Creating music is iterative, requiring varied methods at each stage. However, existing AI
music systems fall short in orchestrating multiple subsystems for diverse needs. To address …
music systems fall short in orchestrating multiple subsystems for diverse needs. To address …
Tiva: Time-aligned video-to-audio generation
Video-to-audio generation is crucial for autonomous video editing and post-processing,
which aims to generate high-quality audio for silent videos with semantic similarity and …
which aims to generate high-quality audio for silent videos with semantic similarity and …
VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation
Voice large language models (LLMs) cast voice synthesis as a language modeling task in a
discrete space, and have demonstrated significant progress to date. Despite the recent …
discrete space, and have demonstrated significant progress to date. Despite the recent …
AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps
Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the
forefront of various generative tasks. However, their iterative sampling process poses a …
forefront of various generative tasks. However, their iterative sampling process poses a …
Tango 2: Aligning diffusion-based text-to-audio generative models through direct preference optimization
Generative multimodal content is increasingly prevalent in much of the content creation
arena, as it has the potential to allow artists and media personnel to create pre-production …
arena, as it has the potential to allow artists and media personnel to create pre-production …
Dance-to-music generation with encoder-based textual inversion of diffusion models
The harmonious integration of music with dance movements is pivotal in vividly conveying
the artistic essence of dance. This alignment also significantly elevates the immersive quality …
the artistic essence of dance. This alignment also significantly elevates the immersive quality …
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Recent advancements in latent diffusion models (LDMs) have markedly enhanced text-to-
audio generation, yet their iterative sampling processes impose substantial computational …
audio generation, yet their iterative sampling processes impose substantial computational …
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
T Xu, J Li, X Chen, X Yao, S Liu - arxiv preprint arxiv:2405.02801, 2024 - arxiv.org
In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements,
facilitating the generation of music, images, and other forms of artistic expression across …
facilitating the generation of music, images, and other forms of artistic expression across …