Diff-bgm: A diffusion model for video background music generation

S Li, Y Qin, M Zheng, X **… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
When editing a video a piece of attractive background music is indispensable. However
video background music generation tasks face several challenges for example the lack of …

Zero-shot unsupervised and text-based audio editing using DDPM inversion

H Manor, T Michaeli - arxiv preprint arxiv:2402.10009, 2024 - arxiv.org
Editing signals using large pre-trained models, in a zero-shot manner, has recently seen
rapid advancements in the image domain. However, this wave has yet to reach the audio …

Dance-to-Music Generation with Encoder-based Textual Inversion

S Li, W Dong, Y Zhang, F Tang, C Ma… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
The seamless integration of music with dance movements is essential for communicating the
artistic intent of a dance piece. This alignment also significantly improves the immersive …

Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning

FD Tsai, SL Wu, H Kim, BY Chen, HC Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-music models allow users to generate nearly realistic musical audio with textual
commands. However, editing music audios remains challenging due to the conflicting …

PAGURI: a user experience study of creative interaction with text-to-music models

F Ronchini, L Comanducci, G Perego… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, text-to-music models have been the biggest breakthrough in automatic
music generation. While they are unquestionably a showcase of technological progress, it is …

Prompt-guided precise audio editing with diffusion models

M Xu, C Li, D Su, W Liang, D Yu - arxiv preprint arxiv:2406.04350, 2024 - arxiv.org
Audio editing involves the arbitrary manipulation of audio content through precise control.
Although text-guided diffusion models have made significant advancements in text-to-audio …

Audio conditioning for music generation via discrete bottleneck features

S Rouard, Y Adi, J Copet, A Roebel… - arxiv preprint arxiv …, 2024 - arxiv.org
While most music generation models use textual or parametric conditioning (eg tempo,
harmony, musical genre), we propose to condition a language model based music …

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

H Liu, J Wang, X Li, R Huang, Y Liu, J Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-guided diffusion models make a paradigm shift in audio generation, facilitating the
adaptability of source audio to conform to specific textual prompts. Recent works introduce …

TAS: Personalized Text-guided Audio Spatialization

Z Li, B Zhao, Y Yuan - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Synthesizing binaural audio according to personalized requirements is crucial for building
immersive artificial spaces. Previous methods employ the visual modality to guide audio …

Personalized Image Generation with Large Multimodal Models

Y Xu, W Wang, Y Zhang, T Biao, P Yan, F Feng… - arxiv preprint arxiv …, 2024 - arxiv.org
Personalized content filtering, such as recommender systems, has become a critical
infrastructure to alleviate information overload. However, these systems merely filter existing …