- Academic Search

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

保存引用被引用数: 227 関連記事全 6 バージョン

[Free GPT-4]

[PDF] arxiv.org

Diffusion Models for Image Restoration and Enhancement--A Comprehensive Survey

X Li, Y Ren, X **, C Lan, X Wang, W Zeng… - arxiv preprint arxiv …, 2023 - arxiv.org

Image restoration (IR) has been an indispensable and challenging task in the low-level
vision field, which strives to improve the subjective quality of images distorted by various …

保存引用被引用数: 81 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

R Huang, J Huang, D Yang, Y Ren… - International …, 2023 - proceedings.mlr.press

Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …

保存引用被引用数: 315 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

保存引用被引用数: 244 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]

[PDF] aaai.org

Audiogpt: Understanding and generating speech, music, sound, and talking head

R Huang, M Li, D Yang, J Shi, X Chang, Z Ye… - Proceedings of the …, 2024 - ojs.aaai.org

Large language models (LLMs) have exhibited remarkable capabilities across a variety of
domains and tasks, challenging our understanding of learning and cognition. Despite the …

保存引用被引用数: 177 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Universal guidance for diffusion models

A Bansal, HM Chu, A Schwarzschild… - Proceedings of the …, 2023 - openaccess.thecvf.com

Typical diffusion models are trained to accept a particular form of conditioning, most
commonly text, and cannot be conditioned on other modalities without retraining. In this …

保存引用被引用数: 120 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Text-to-audio generation using instruction-tuned llm and latent diffusion model

D Ghosal, N Majumder, A Mehrish, S Poria - arxiv preprint arxiv …, 2023 - arxiv.org

The immense scale of the recent large language models (LLM) allows many interesting
properties, such as, instruction-and chain-of-thought-based fine-tuning, that has significantly …

保存引用被引用数: 149 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Prodiff: Progressive fast diffusion model for high-quality text-to-speech

R Huang, Z Zhao, H Liu, J Liu, C Cui… - Proceedings of the 30th …, 2022 - dl.acm.org

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading
performances in many generative tasks. However, the inherited iterative sampling process …

保存引用被引用数: 180 関連記事全 3 バージョン

[Free GPT-4]

[PDF] thecvf.com

How to backdoor diffusion models?

SY Chou, PY Chen, TY Ho - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Diffusion models are state-of-the-art deep learning empowered generative models that are
trained based on the principle of learning forward and reverse diffusion processes via …

保存引用被引用数: 105 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Editing implicit assumptions in text-to-image diffusion models

H Orgad, B Kawar, Y Belinkov - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Text-to-image diffusion models often make implicit assumptions about the world when
generating images. While some assumptions are useful (eg, the sky is blue), they can also …

保存引用被引用数: 70 関連記事全 5 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Fastdiff: A fast conditional diffusion model for high-quality speech synthesis

A review of deep learning techniques for speech processing

Diffusion Models for Image Restoration and Enhancement--A Comprehensive Survey

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

Voicebox: Text-guided multilingual universal speech generation at scale

Audiogpt: Understanding and generating speech, music, sound, and talking head

Universal guidance for diffusion models

Text-to-audio generation using instruction-tuned llm and latent diffusion model

Prodiff: Progressive fast diffusion model for high-quality text-to-speech

How to backdoor diffusion models?

Editing implicit assumptions in text-to-image diffusion models