A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Y Cao, S Li, Y Liu, Z Yan, Y Dai, PS Yu… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …

A survey of ai-generated content (aigc)

Y Cao, S Li, Y Liu, Z Yan, Y Dai, P Yu, L Sun - ACM Computing Surveys, 2025 - dl.acm.org
Recently, Artificial Intelligence Generated Content (AIGC) has gained significant attention
from society, especially with the rise of Generative AI (GAI) techniques such as ChatGPT …

Mulan: A joint embedding of music audio and natural language

Q Huang, A Jansen, J Lee, R Ganti, JY Li… - arxiv preprint arxiv …, 2022 - arxiv.org
Music tagging and content-based retrieval systems have traditionally been constructed
using pre-defined ontologies covering a rigid set of music attributes or text queries. This …

Marble: Music audio representation benchmark for universal evaluation

R Yuan, Y Ma, Y Li, G Zhang, X Chen… - Advances in …, 2023 - proceedings.neurips.cc
In the era of extensive intersection between art and Artificial Intelligence (AI), such as image
generation and fiction co-creation, AI for music remains relatively nascent, particularly in …

The song describer dataset: a corpus of audio captions for music-and-language evaluation

I Manco, B Weck, S Doh, M Won, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality
audio-caption pairs, designed for the evaluation of music-and-language models. The …

Contrastive audio-language learning for music

I Manco, E Benetos, E Quinton, G Fazekas - arxiv preprint arxiv …, 2022 - arxiv.org
As one of the most intuitive interfaces known to humans, natural language has the potential
to mediate many tasks that involve human-computer interaction, especially in application …

Supervised and unsupervised learning of audio representations for music understanding

MC McCallum, F Korzeniowski, S Oramas… - arxiv preprint arxiv …, 2022 - arxiv.org
In this work, we provide a broad comparative analysis of strategies for pre-training audio
understanding models for several tasks in the music domain, including labelling of genre …

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Multi-source diffusion models for simultaneous music generation and separation

G Mariani, I Tallini, E Postolache, M Mancusi… - arxiv preprint arxiv …, 2023 - arxiv.org
In this work, we define a diffusion-based generative model capable of both music synthesis
and source separation by learning the score of the joint probability density of sources …

Toward universal text-to-music retrieval

SH Doh, M Won, K Choi, J Nam - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
This paper introduces effective design choices for text-to-music retrieval systems. An ideal
text-based retrieval system would support various input queries such as pre-defined tags …