Audio-Language Datasets of Scenes and Events: A Survey

G Wijngaard, E Formisano, M Esposito… - IEEE …, 2025 - ieeexplore.ieee.org
Audio-language models (ALMs) generate linguistic descriptions of sound-producing events
and scenes. Advances in dataset creation and computational power have led to significant …

Audio-Language Models for Audio-Centric Tasks: A survey

Y Su, J Bai, Q Xu, K Xu, Y Dou - arxiv preprint arxiv:2501.15177, 2025 - arxiv.org
Audio-Language Models (ALMs), which are trained on audio-text data, focus on the
processing, understanding, and reasoning of sounds. Unlike traditional supervised learning …