Foundation models for music: A survey
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
Joint detection and classification of singing voice melody using convolutional recurrent neural networks
Singing melody extraction essentially involves two tasks: one is detecting the activity of a
singing voice in polyphonic music, and the other is estimating the pitch of a singing voice in …
singing voice in polyphonic music, and the other is estimating the pitch of a singing voice in …
Simultaneous separation and transcription of mixtures with multiple polyphonic and percussive instruments
We present a single deep learning architecture that can both separate an audio recording of
a musical mixture into constituent single-instrument recordings and transcribe these …
a musical mixture into constituent single-instrument recordings and transcribe these …
A streamlined encoder/decoder architecture for melody extraction
Melody extraction in polyphonic musical audio is important for music signal processing. In
this paper, we propose a novel streamlined encoder/decoder network that is designed for …
this paper, we propose a novel streamlined encoder/decoder network that is designed for …
Conditioned source separation for musical instrument performances
In music source separation, the number of sources may vary for each piece and some of the
sources may belong to the same family of instruments, thus sharing timbral characteristics …
sources may belong to the same family of instruments, thus sharing timbral characteristics …
Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy
Separating a singing voice from its music accompaniment remains an important challenge in
the field of music information retrieval. We present a unique neural network approach …
the field of music information retrieval. We present a unique neural network approach …
Weakly informed audio source separation
Prior information about the target source can improve audio source separation quality but is
usually not available with the necessary level of audio alignment. This has limited its …
usually not available with the necessary level of audio alignment. This has limited its …
Multi-task learning to enable location mention identification in the early hours of a crisis event
Training a robust and reliable deep learning model requires a large amount of data. In the
crisis domain, building deep learning models to identify actionable information from the …
crisis domain, building deep learning models to identify actionable information from the …
Modeling the compatibility of stem tracks to generate music mashups
A music mashup combines audio elements from two or more songs to create a new work. To
reduce the time and effort required to make them, researchers have developed algorithms …
reduce the time and effort required to make them, researchers have developed algorithms …
Joint singing pitch estimation and voice separation based on a neural harmonic structure renderer
This paper describes a multi-task learning approach to joint extraction (fundamental
frequency (F0) estimation) and separation of singing voices from music signals. While deep …
frequency (F0) estimation) and separation of singing voices from music signals. While deep …