Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Joint detection and classification of singing voice melody using convolutional recurrent neural networks

S Kum, J Nam - Applied Sciences, 2019 - mdpi.com
Singing melody extraction essentially involves two tasks: one is detecting the activity of a
singing voice in polyphonic music, and the other is estimating the pitch of a singing voice in …

Simultaneous separation and transcription of mixtures with multiple polyphonic and percussive instruments

E Manilow, P Seetharaman… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
We present a single deep learning architecture that can both separate an audio recording of
a musical mixture into constituent single-instrument recordings and transcribe these …

A streamlined encoder/decoder architecture for melody extraction

TH Hsieh, L Su, YH Yang - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Melody extraction in polyphonic musical audio is important for music signal processing. In
this paper, we propose a novel streamlined encoder/decoder network that is designed for …

Conditioned source separation for musical instrument performances

O Slizovskaia, G Haro, E Gómez - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
In music source separation, the number of sources may vary for each piece and some of the
sources may belong to the same family of instruments, thus sharing timbral characteristics …

Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy

KWE Lin, BT Balamurali, E Koh, S Lui… - Neural Computing and …, 2020 - Springer
Separating a singing voice from its music accompaniment remains an important challenge in
the field of music information retrieval. We present a unique neural network approach …

Weakly informed audio source separation

K Schulze-Forster, C Doire, G Richard… - 2019 IEEE Workshop …, 2019 - ieeexplore.ieee.org
Prior information about the target source can improve audio source separation quality but is
usually not available with the necessary level of audio alignment. This has limited its …

Multi-task learning to enable location mention identification in the early hours of a crisis event

S Khanal, D Caragea - Findings of the Association for …, 2021 - aclanthology.org
Training a robust and reliable deep learning model requires a large amount of data. In the
crisis domain, building deep learning models to identify actionable information from the …

Modeling the compatibility of stem tracks to generate music mashups

J Huang, JC Wang, JBL Smith, X Song… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
A music mashup combines audio elements from two or more songs to create a new work. To
reduce the time and effort required to make them, researchers have developed algorithms …

Joint singing pitch estimation and voice separation based on a neural harmonic structure renderer

T Nakano, K Yoshii, Y Wu, R Nishikimi… - … IEEE Workshop on …, 2019 - ieeexplore.ieee.org
This paper describes a multi-task learning approach to joint extraction (fundamental
frequency (F0) estimation) and separation of singing voices from music signals. While deep …