The song describer dataset: a corpus of audio captions for music-and-language evaluation
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality
audio-caption pairs, designed for the evaluation of music-and-language models. The …
audio-caption pairs, designed for the evaluation of music-and-language models. The …
Lp-musiccaps: Llm-based pseudo music captioning
Automatic music captioning, which generates natural language descriptions for given music
tracks, holds significant potential for enhancing the understanding and organization of large …
tracks, holds significant potential for enhancing the understanding and organization of large …
Musechat: A conversational music recommendation system for videos
Music recommendation for videos attracts growing interest in multi-modal research.
However existing systems focus primarily on content compatibility often ignoring the users' …
However existing systems focus primarily on content compatibility often ignoring the users' …
Wikimute: A web-sourced dataset of semantic descriptions for music audio
Multi-modal deep learning techniques for matching free-form text with music have shown
promising results in the field of Music Information Retrieval (MIR). Prior work is often based …
promising results in the field of Music Information Retrieval (MIR). Prior work is often based …
Can Impressions of Music be Extracted from Thumbnail Images?
T Harada, T Motomitsu, K Hayashi, Y Sakai… - arxiv preprint arxiv …, 2025 - arxiv.org
In recent years, there has been a notable increase in research on machine learning models
for music retrieval and generation systems that are capable of taking natural language …
for music retrieval and generation systems that are capable of taking natural language …
[PDF][PDF] Annotator Subjectivity in the MusicCaps Dataset.
Musical caption, when expressed in free-form text as opposed to more structured and limited
musical tags, often encompasses the individual characteristics of the annotator, thereby …
musical tags, often encompasses the individual characteristics of the annotator, thereby …
Zero-Shot Structure Labeling with Audio And Language Model Embeddings
Recent progress on audio-based music structure analysis has closely aligned with the
appearance of new deep learning paradigms, notably for the extraction of robust spectro …
appearance of new deep learning paradigms, notably for the extraction of robust spectro …