Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Simple and controllable music generation

J Copet, F Kreuk, I Gat, T Remez… - Advances in …, 2023 - proceedings.neurips.cc
We tackle the task of conditional music generation. We introduce MusicGen, a single
Language Model (LM) that operates over several streams of compressed discrete music …

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Textually pretrained speech language models

M Hassid, T Remez, TA Nguyen, I Gat… - Advances in …, 2023 - proceedings.neurips.cc
Speech language models (SpeechLMs) process and generate acoustic data only, without
textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using …

Soundstorm: Efficient parallel audio generation

Z Borsos, M Sharifi, D Vincent, E Kharitonov… - arxiv preprint arxiv …, 2023 - arxiv.org
We present SoundStorm, a model for efficient, non-autoregressive audio generation.
SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional …

Music controlnet: Multiple time-varying controls for music generation

SL Wu, C Donahue, S Watanabe… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Text-to-music generation models are now capable of generating high-quality music audio in
broad styles. However, text control is primarily suitable for the manipulation of global musical …

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Chatmusician: Understanding and generating music intrinsically with llm

R Yuan, H Lin, Y Wang, Z Tian, S Wu, T Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
While Large Language Models (LLMs) demonstrate impressive capabilities in text
generation, we find that their ability has yet to be generalized to music, humanity's creative …

Masked audio generation using a single non-autoregressive transformer

A Ziv, I Gat, GL Lan, T Remez, F Kreuk… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce MAGNeT, a masked generative sequence modeling method that operates
directly over several streams of audio tokens. Unlike prior work, MAGNeT is comprised of a …

Multi-source diffusion models for simultaneous music generation and separation

G Mariani, I Tallini, E Postolache, M Mancusi… - arxiv preprint arxiv …, 2023 - arxiv.org
In this work, we define a diffusion-based generative model capable of both music synthesis
and source separation by learning the score of the joint probability density of sources …