Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Separate anything you describe

X Liu, Q Kong, Y Zhao, H Liu, Y Yuan… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Language-queried audio source separation (LASS) is a new paradigm for computational
auditory scene analysis (CASA). LASS aims to separate a target sound from an audio …

Few-shot class-incremental audio classification via discriminative prototype learning

W **e, Y Li, Q He, W Cao - Expert Systems with Applications, 2023 - Elsevier
In real-world scenarios, new audio classes with insufficient samples usually emerge
continually, which motivates the study of few-shot class-incremental audio classification …

[PDF][PDF] Source Separation of Piano Concertos with Test-Time Adaptation.

Y Özer, M Müller - ISMIR, 2022 - audiolabs-erlangen.de
Music source separation (MSS) aims at decomposing a music recording into its constituent
sources, such as a lead instrument and the accompaniment. Despite the difficulties in MSS …

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

KN Watcharasupat, A Lerch - arxiv preprint arxiv:2406.18747, 2024 - arxiv.org
Despite significant recent progress across multiple subtasks of audio source separation, few
music source separation systems support separation beyond the four-stem vocals, drums …

LC-Protonets: Multi-label Few-shot learning for world music audio tagging

C Papaioannou, E Benetos… - IEEE Open Journal of …, 2025 - ieeexplore.ieee.org
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the
problem of multi-label few-shot classification, where a model must generalize to new classes …

Task-Aware Unified Source Separation

K Saijo, J Ebbers, FG Germain, G Wichern… - arxiv preprint arxiv …, 2024 - arxiv.org
Several attempts have been made to handle multiple source separation tasks such as
speech enhancement, speech separation, sound event separation, music source separation …

Cross-Domain Contrastive Learning-Based Few-Shot Underwater Acoustic Target Recognition

X Cui, Z He, Y Xue, K Tang, P Zhu, J Han - Journal of Marine Science …, 2024 - mdpi.com
Underwater Acoustic Target Recognition (UATR) plays a crucial role in underwater detection
devices. However, due to the difficulty and high cost of collecting data in the underwater …

[HTML][HTML] Selective Annotation of Few Data for Beat Tracking of Latin American Music Using Rhythmic Features

LS Maia, M Rocamora, LWP Biscainho… - Transactions of the …, 2024 - transactions.ismir.net
Training state-of-the-art beat tracking models usually requires large amounts of annotated
data. It is widely known that data annotation is a time-consuming process and generally …

Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries

KN Watcharasupat, A Lerch - arxiv preprint arxiv:2501.16171, 2025 - arxiv.org
Music source separation is an audio-to-audio retrieval task of extracting one or more
constituent components, or composites thereof, from a musical audio mixture. Each of these …