Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Music controlnet: Multiple time-varying controls for music generation

SL Wu, C Donahue, S Watanabe… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Text-to-music generation models are now capable of generating high-quality music audio in
broad styles. However, text control is primarily suitable for the manipulation of global musical …

Mert: Acoustic music understanding model with large-scale self-supervised training

Y Li, R Yuan, G Zhang, Y Ma, X Chen, H Yin… - arxiv preprint arxiv …, 2023 - arxiv.org
Self-supervised learning (SSL) has recently emerged as a promising paradigm for training
generalisable models on large-scale data in the fields of vision, text, and speech. Although …

Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions

YS Huang, YH Yang - Proceedings of the 28th ACM international …, 2020 - dl.acm.org
A great number of deep learning based models have been recently proposed for automatic
music composition. Among these models, the Transformer stands out as a prominent …

Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs

WY Hsiao, JY Liu, YC Yeh, YH Yang - Proceedings of the AAAI …, 2021 - ojs.aaai.org
To apply neural sequence models such as the Transformers to music generation tasks, one
has to represent a piece of music by a sequence of tokens drawn from a finite set of pre …

Acoustic data detection in large-scale emergency vehicle sirens and road noise dataset

MY Shams, T Abd El-Hafeez, E Hassan - Expert Systems with Applications, 2024 - Elsevier
This paper presents a novel deep learning model called Self-Attention Layer within a
Convolutional Neural Network (SACNN), specifically designed for detecting acoustic data in …

Music2dance: Dancenet for music-driven dance generation

W Zhuang, C Wang, J Chai, Y Wang, M Shao… - ACM Transactions on …, 2022 - dl.acm.org
Synthesize human motions from music (ie, music to dance) is appealing and has attracted
lots of research interests in recent years. It is challenging because of the requirement for …

Marble: Music audio representation benchmark for universal evaluation

R Yuan, Y Ma, Y Li, G Zhang, X Chen… - Advances in …, 2023 - proceedings.neurips.cc
In the era of extensive intersection between art and Artificial Intelligence (AI), such as image
generation and fiction co-creation, AI for music remains relatively nascent, particularly in …

[HTML][HTML] A comprehensive review on music transcription

B Bhattarai, J Lee - Applied Sciences, 2023 - mdpi.com
Music transcription is the process of transforming recorded sound of musical performances
into symbolic representations such as sheet music or MIDI files. Extensive research and …

MuseMorphose: Full-song and fine-grained piano music style transfer with one transformer VAE

SL Wu, YH Yang - IEEE/ACM Transactions on Audio, Speech …, 2023 - ieeexplore.ieee.org
Transformers and variational autoencoders (VAE) have been extensively employed for
symbolic (eg, MIDI) domain music generation. While the former boast an impressive …