Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Omnibench: Towards the future of universal omni-language models

Y Li, G Zhang, Y Ma, R Yuan, K Zhu, H Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in multimodal large language models (MLLMs) have aimed to
integrate and interpret data across diverse modalities. However, the capacity of these …

From Audio Deepfake Detection to AI-Generated Music Detection--A Pathway and Overview

Y Li, M Milling, L Specia, BW Schuller - arxiv preprint arxiv:2412.00571, 2024 - arxiv.org
As Artificial Intelligence (AI) technologies continue to evolve, their use in generating realistic,
contextually appropriate content has expanded into various domains. Music, an art form and …

LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment

H Zhang, V Cheung, H Nishioka, S Dixon… - arxiv preprint arxiv …, 2024 - arxiv.org
Research in music understanding has extensively explored composition-level attributes
such as key, genre, and instrumentation through advanced representations, leading to cross …

LC-Protonets: Multi-label Few-shot learning for world music audio tagging

C Papaioannou, E Benetos… - IEEE Open Journal of …, 2025 - ieeexplore.ieee.org
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the
problem of multi-label few-shot classification, where a model must generalize to new classes …

[HTML][HTML] Seeing the Sound: Multilingual Lip Sync for Real-Time Face-to-Face Translation

A Rafiei Oskooei, MS Aktaş, M Keleş - Computers, 2024 - mdpi.com
Imagine a future where language is no longer a barrier to real-time conversations, enabling
instant and lifelike communication across the globe. As cultural boundaries blur, the demand …

Innovation, data colonialism and ethics: critical reflections on the impacts of AI on Irish traditional music

E Kanhov, AK Kaila, BLT Sturm - Journal of New Music Research, 2024 - Taylor & Francis
By definition, traditional music is in a constant state of friction with innovation, exemplified by
resistance to 'outside'influences such as different instruments, different ways of learning, and …

Music Genre Classification using Large Language Models

MEA Meguenani, AS Britto Jr, AL Koerich - arxiv preprint arxiv …, 2024 - arxiv.org
This paper exploits the zero-shot capabilities of pre-trained large language models (LLMs)
for music genre classification. The proposed approach splits audio signals into 20 ms …

Hierarchical Symbolic Pop Music Generation with Graph Neural Networks

WQ Lim, J Liang, H Zhang - arxiv preprint arxiv:2409.08155, 2024 - arxiv.org
Music is inherently made up of complex structures, and representing them as graphs helps
to capture multiple levels of relationships. While music generation has been explored using …

[PDF][PDF] Towards Music Industry 5.0: Perspectives on Artificial Intelligence

A Williams, M Barthet - 2025 - researchgate.net
Artificial Intelligence (AI) is a disruptive technology that is transforming many industries
including the music industry. Recently, the concept of Industry 5.0. has been proposed …