Large language models for time series: A survey

X Zhang, RR Chowdhury, RK Gupta… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have seen significant use in domains such as natural
language processing and computer vision. Going beyond text, image and graphics, LLMs …

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arxiv preprint arxiv:2210.13438, 2022 - arxiv.org
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

Mert: Acoustic music understanding model with large-scale self-supervised training

Y Li, R Yuan, G Zhang, Y Ma, X Chen, H Yin… - arxiv preprint arxiv …, 2023 - arxiv.org
Self-supervised learning (SSL) has recently emerged as a promising paradigm for training
generalisable models on large-scale data in the fields of vision, text, and speech. Although …

Audio flamingo: A novel audio language model with few-shot learning and dialogue abilities

Z Kong, A Goel, R Badlani, W **, R Valle… - arxiv preprint arxiv …, 2024 - arxiv.org
Augmenting large language models (LLMs) to understand audio--including non-speech
sounds and non-verbal speech--is critically important for diverse real-world applications of …

The internet of sounds: Convergent trends, insights, and future directions

L Turchet, M Lagrange, C Rottondi… - IEEE Internet of …, 2023 - ieeexplore.ieee.org
Current sound-based practices and systems developed in both academia and industry point
to convergent research trends that bring together the field of sound and music Computing …

The voiceprivacy 2024 challenge evaluation plan

N Tomashenko, X Miao, P Champion, S Meyer… - arxiv preprint arxiv …, 2024 - arxiv.org
The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …

Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling

S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models have been effectively applied to modeling natural signals, such as
images, video, speech, and audio. A crucial component of these models is the codec …

Music understanding llama: Advancing text-to-music generation with question answering and captioning

S Liu, AS Hussain, C Sun… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale
publicly available music datasets with natural language captions. To address this, we …