Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

A comprehensive survey and analysis of generative models in machine learning

GM Harshvardhan, MK Gourisaria, M Pandey… - Computer Science …, 2020 - Elsevier
Generative models have been in existence for many decades. In the field of machine
learning, we come across many scenarios when directly learning a target is intractable …

[PDF][PDF] Jukebox: A generative model for music

P Dhariwal, H Jun, C Payne, JW Kim… - arxiv preprint arxiv …, 2020 - assets.pubpub.org
We introduce Jukebox, a model that generates music with singing in the raw audio domain.
We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete …

Melgan: Generative adversarial networks for conditional waveform synthesis

K Kumar, R Kumar, T De Boissiere… - Advances in neural …, 2019 - proceedings.neurips.cc
Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating
coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is …

Mert: Acoustic music understanding model with large-scale self-supervised training

Y Li, R Yuan, G Zhang, Y Ma, X Chen, H Yin… - arxiv preprint arxiv …, 2023 - arxiv.org
Self-supervised learning (SSL) has recently emerged as a promising paradigm for training
generalisable models on large-scale data in the fields of vision, text, and speech. Although …

DDSP: Differentiable digital signal processing

J Engel, L Hantrakul, C Gu, A Roberts - arxiv preprint arxiv:2001.04643, 2020 - arxiv.org
Most generative models of audio directly generate samples in one of two domains: time or
frequency. While sufficient to express any signal, these representations are inefficient, as …