Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024‏ - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Contrastive audio-language learning for music

I Manco, E Benetos, E Quinton, G Fazekas - arxiv preprint arxiv …, 2022‏ - arxiv.org
As one of the most intuitive interfaces known to humans, natural language has the potential
to mediate many tasks that involve human-computer interaction, especially in application …

MSCCov19Net: multi-branch deep learning model for COVID-19 detection from cough sounds

S Ulukaya, AA Sarıca, O Erdem, A Karaali - Medical & Biological …, 2023‏ - Springer
Coronavirus has an impact on millions of lives and has been added to the important
pandemics that continue to affect with its variants. Since it is transmitted through the …

Multimodal music information processing and retrieval: Survey and future challenges

F Simonetta, S Ntalampiras… - … workshop on multilayer …, 2019‏ - ieeexplore.ieee.org
Towards improving the performance in various music information processing tasks, recent
studies exploit different modalities able to capture diverse aspects of music. Such modalities …

Audio-based musical version identification: Elements and challenges

F Yesiler, G Doras, RM Bittner… - IEEE Signal …, 2021‏ - ieeexplore.ieee.org
Creating novel interpretations of existing musical compositions is and has always been an
essential part of musical practice. Before the advent of recorded music, listening to a piece of …

[PDF][PDF] Query by Video: Cross-modal Music Retrieval.

B Li, A Kumar - ISMIR, 2019‏ - academia.edu
Cross-modal retrieval learns the relationship between the two types of data in a common
space so that an input from one modality can retrieve data from a different modality. We …

How blind and visually impaired composers, producers, and songwriters leverage and adapt music technology

WC Payne, AY Xu, F Ahmed, L Ye, A Hurst - Proceedings of the 22nd …, 2020‏ - dl.acm.org
Today, music creation software and hardware are central to the workflow of most
professional composers, producers, and songwriters. Music is an aural art form, but it is …

Cross-modal music-video recommendation: A study of design choices

L Prétet, G Richard, G Peeters - 2021 International Joint …, 2021‏ - ieeexplore.ieee.org
In this work, we study music/video cross-modal recommendation, ie recommending a music
track for a video or vice versa. We rely on a self-supervised learning paradigm to learn from …

[HTML][HTML] Erkomaishvili Dataset: A curated corpus of traditional Georgian vocal music for computational musicology

S Rosenzweig, F Scherbaum… - Transactions of the …, 2020‏ - transactions.ismir.net
The analysis of recorded audio material using computational methods has received
increased attention in ethnomusicological research. We present a curated dataset of …

Learning explicit and implicit dual common subspaces for audio-visual cross-modal retrieval

D Zeng, J Wu, G Hattori, R Xu, Y Yu - ACM Transactions on Multimedia …, 2023‏ - dl.acm.org
Audio-visual tracks in video contain rich semantic information with potential in many
applications and research. Since the audio-visual data have inconsistent distributions and …