Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Wave-u-net: A multi-scale neural network for end-to-end audio source separation

D Stoller, S Ewert, S Dixon - arxiv preprint arxiv:1806.03185, 2018 - arxiv.org
Models for audio source separation usually operate on the magnitude spectrum, which
ignores phase information and makes separation performance dependant on hyper …

Source separation in ecoacoustics: A roadmap towards versatile soundscape information retrieval

TH Lin, Y Tsao - Remote Sensing in Ecology and Conservation, 2020 - Wiley Online Library
A comprehensive assessment of ecosystem dynamics requires the monitoring of biological,
physical and social changes. Changes that cannot be observed visually may be trackable …

Co-separating sounds of visual objects

R Gao, K Grauman - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
Learning how objects sound from video is challenging, since they often heavily overlap in a
single audio channel. Current methods for visually-guided audio source separation sidestep …

Demucs: Deep extractor for music sources with extra unlabeled data remixed

A Défossez, N Usunier, L Bottou, F Bach - arxiv preprint arxiv:1909.01174, 2019 - arxiv.org
We study the problem of source separation for music using deep learning with four known
sources: drums, bass, vocals and other accompaniments. State-of-the-art approaches …

A differentiable perceptual audio metric learned from just noticeable differences

P Manocha, A Finkelstein, R Zhang, NJ Bryan… - arxiv preprint arxiv …, 2020 - arxiv.org
Many audio processing tasks require perceptual assessment. The``gold standard``of
obtaining human judgments is time-consuming, expensive, and cannot be used as an …

Score-informed source separation of choral music

M Gover - 2020 - escholarship.mcgill.ca
La séparation de sources sonores consiste à extraire une ou plusieurs sources présentant
un attrait significatif d'un enregistrement contenant plusieurs sources sonores. Ces …

SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

M Delcroix, JB Vázquez, T Ochiai… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
In many situations, we would like to hear desired sound events (SEs) while being able to
ignore interference. Target sound extraction (TSE) tackles this problem by estimating the …

Source separation with weakly labelled data: An approach to computational auditory scene analysis

Q Kong, Y Wang, X Song, Y Cao… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Source separation is the task of separating an audio recording into individual sound
sources. Source separation is fundamental for computational auditory scene analysis …

Snore-GANs: Improving automatic snore sound classification with synthesized data

Z Zhang, J Han, K Qian, C Janott… - IEEE journal of …, 2019 - ieeexplore.ieee.org
One of the frontier issues that severely hamper the development of automatic snore sound
classification (ASSC) associates to the lack of sufficient supervised training data. To cope …