Foundation models for music: A survey
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
Mert: Acoustic music understanding model with large-scale self-supervised training
Self-supervised learning (SSL) has recently emerged as a promising paradigm for training
generalisable models on large-scale data in the fields of vision, text, and speech. Although …
generalisable models on large-scale data in the fields of vision, text, and speech. Although …
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …
Marble: Music audio representation benchmark for universal evaluation
In the era of extensive intersection between art and Artificial Intelligence (AI), such as image
generation and fiction co-creation, AI for music remains relatively nascent, particularly in …
generation and fiction co-creation, AI for music remains relatively nascent, particularly in …
Lyricwhiz: Robust multilingual zero-shot lyrics transcription by whispering to chatgpt
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription
method achieving state-of-the-art performance on various lyrics transcription datasets, even …
method achieving state-of-the-art performance on various lyrics transcription datasets, even …
On the effectiveness of speech self-supervised learning for music
Self-supervised learning (SSL) has shown promising results in various speech and natural
language processing applications. However, its efficacy in music information retrieval (MIR) …
language processing applications. However, its efficacy in music information retrieval (MIR) …
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Self-supervised learning (SSL) has recently emerged as a promising paradigm for training
generalisable models on large-scale data in the fields of vision, text, and speech. Although …
generalisable models on large-scale data in the fields of vision, text, and speech. Although …
Learning music representations with wav2vec 2.0
Learning music representations that are general-purpose offers the flexibility to finetune
several downstream tasks using smaller datasets. The wav2vec 2.0 speech representation …
several downstream tasks using smaller datasets. The wav2vec 2.0 speech representation …
MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization
Recent years have witnessed the success of foundation models pre-trained with self-
supervised learning (SSL) in various music informatics understanding tasks, including music …
supervised learning (SSL) in various music informatics understanding tasks, including music …
Unsupervised Musical Object Discovery from Audio
Current object-centric learning models such as the popular SlotAttention architecture allow
for unsupervised visual scene decomposition. Our novel MusicSlots method adapts …
for unsupervised visual scene decomposition. Our novel MusicSlots method adapts …