- Academic Search

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023‏ - Elsevier‏

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …‏

שמור צטט צוטט על ידי 242 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the challenges and opportunities in generative ai‏

L Manduchi, K Pandey, R Bamler, R Cotterell… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The field of deep generative modeling has grown rapidly and consistently over the years.
With the availability of massive amounts of training data coupled with advances in scalable …‏

שמור צטט צוטט על ידי 20 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audioldm: Text-to-audio generation with latent diffusion models‏

H Liu, Z Chen, Y Yuan, X Mei, X Liu, D Mandic… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general
audio based on text descriptions. However, previous studies in TTA have limited generation …‏

שמור צטט צוטט על ידי 571 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

Audioldm 2: Learning holistic audio generation with self-supervised pretraining‏

H Liu, Y Yuan, X Liu, X Mei, Q Kong… - … on Audio, Speech …, 2024‏ - ieeexplore.ieee.org‏

Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …‏

שמור צטט צוטט על ידי 143 מאמרים בנושא זה כל 8 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diffsound: Discrete diffusion model for text-to-sound generation‏

D Yang, J Yu, H Wang, W Wang… - … on Audio, Speech …, 2023‏ - ieeexplore.ieee.org‏

Generating sound effects that people want is an important topic. However, there are limited
studies in this area for sound generation. In this study, we investigate generating sound …‏

שמור צטט צוטט על ידי 318 מאמרים בנושא זה כל 5 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Styletts 2: Towards human-level text-to-speech through style diffusion and adversarial training with large speech language models‏

YA Li, C Han, V Raghavan… - Advances in Neural …, 2023‏ - proceedings.neurips.cc‏

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style
diffusion and adversarial training with large speech language models (SLMs) to achieve …‏

שמור צטט צוטט על ידי 107 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bigvgan: A universal neural vocoder with large-scale training‏

S Lee, W **, B Ginsburg, B Catanzaro… - arxiv preprint arxiv …, 2022‏ - arxiv.org‏

Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …‏

שמור צטט צוטט על ידי 247 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Deblurring via stochastic refinement‏

J Whang, M Delbracio, H Talebi… - Proceedings of the …, 2022‏ - openaccess.thecvf.com‏

Image deblurring is an ill-posed problem with multiple plausible solutions for a given input
image. However, most existing methods produce a deterministic estimate of the clean image …‏

שמור צטט צוטט על ידי 299 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on neural speech synthesis‏

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021‏ - arxiv.org‏

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …‏

שמור צטט צוטט על ידי 471 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Lossy image compression with conditional diffusion models‏

R Yang, S Mandt - Advances in Neural Information …, 2023‏ - proceedings.neurips.cc‏

This paper outlines an end-to-end optimized lossy image compression framework using
diffusion generative models. The approach relies on the transform coding paradigm, where …‏

שמור צטט צוטט על ידי 112 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Priorgrad: Improving conditional denoising diffusion models with data-dependent adaptive prior

A review of deep learning techniques for speech processing‏

On the challenges and opportunities in generative ai‏

Audioldm: Text-to-audio generation with latent diffusion models‏

Audioldm 2: Learning holistic audio generation with self-supervised pretraining‏

Diffsound: Discrete diffusion model for text-to-sound generation‏

Styletts 2: Towards human-level text-to-speech through style diffusion and adversarial training with large speech language models‏

Bigvgan: A universal neural vocoder with large-scale training‏

Deblurring via stochastic refinement‏

A survey on neural speech synthesis‏

Lossy image compression with conditional diffusion models‏