- Academic Search

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Gem Citer Citeret af 240 Relaterede artikler Alle 7 versioner

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models

S Bond-Taylor, A Leach, Y Long… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

Deep generative models are a class of techniques that train deep neural networks to model
the distribution of training samples. Research has fragmented into various interconnected …

Gem Citer Citeret af 657 Relaterede artikler Alle 13 versioner

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

Gem Citer Citeret af 252 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arxiv preprint arxiv:2210.13438, 2022 - arxiv.org

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

Gem Citer Citeret af 707 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiolm: a language modeling approach to audio generation

Z Borsos, R Marinier, D Vincent… - … ACM transactions on …, 2023 - ieeexplore.ieee.org

We introduce AudioLM, a framework for high-quality audio generation with long-term
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …

Gem Citer Citeret af 603 Relaterede artikler Alle 6 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diffsound: Discrete diffusion model for text-to-sound generation

D Yang, J Yu, H Wang, W Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Generating sound effects that people want is an important topic. However, there are limited
studies in this area for sound generation. In this study, we investigate generating sound …

Gem Citer Citeret af 312 Relaterede artikler Alle 5 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bigvgan: A universal neural vocoder with large-scale training

S Lee, W **, B Ginsburg, B Catanzaro… - arxiv preprint arxiv …, 2022 - arxiv.org

Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …

Gem Citer Citeret af 242 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Soundstream: An end-to-end neural audio codec

N Zeghidour, A Luebs, A Omran… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

We present SoundStream, a novel neural audio codec that can efficiently compress speech,
music and general audio at bitrates normally targeted by speech-tailored codecs …

Gem Citer Citeret af 729 Relaterede artikler Alle 6 versioner

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Diff-foley: Synchronized video-to-audio synthesis with latent diffusion models

S Luo, C Yan, C Hu, H Zhao - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract The Video-to-Audio (V2A) model has recently gained attention for its practical
application in generating audio directly from silent videos, particularly in video/film …

Gem Citer Citeret af 72 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Grad-tts: A diffusion probabilistic model for text-to-speech

V Popov, I Vovk, V Gogoryan… - International …, 2021 - proceedings.mlr.press

Recently, denoising diffusion probabilistic models and generative score matching have
shown high potential in modelling complex data distributions while stochastic calculus has …

Gem Citer Citeret af 569 Relaterede artikler Alle 5 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Melgan: Generative adversarial networks for conditional waveform synthesis

A review of deep learning techniques for speech processing

Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models

High-fidelity audio compression with improved rvqgan

High fidelity neural audio compression

Audiolm: a language modeling approach to audio generation

Diffsound: Discrete diffusion model for text-to-sound generation

Bigvgan: A universal neural vocoder with large-scale training

Soundstream: An end-to-end neural audio codec

Diff-foley: Synchronized video-to-audio synthesis with latent diffusion models

Grad-tts: A diffusion probabilistic model for text-to-speech