Diffusion models in vision: A survey

FA Croitoru, V Hondru, RT Ionescu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Denoising diffusion models represent a recent emerging topic in computer vision,
demonstrating remarkable results in the area of generative modeling. A diffusion model is a …

Diffusion models: A comprehensive survey of methods and applications

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org
Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

Adding conditional control to text-to-image diffusion models

L Zhang, A Rao, M Agrawala - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
We present ControlNet, a neural network architecture to add spatial conditioning controls to
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …

Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation

H Wang, X Du, J Li, RA Yeh… - Proceedings of the …, 2023 - openaccess.thecvf.com
A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule
on the learned gradients, and back-propagate the score of a diffusion model through the …

Stable video diffusion: Scaling latent video diffusion models to large datasets

A Blattmann, T Dockhorn, S Kulal… - arxiv preprint arxiv …, 2023 - arxiv.org
We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of-
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …

Imagen video: High definition video generation with diffusion models

J Ho, W Chan, C Saharia, J Whang, R Gao… - arxiv preprint arxiv …, 2022 - arxiv.org
We present Imagen Video, a text-conditional video generation system based on a cascade
of video diffusion models. Given a text prompt, Imagen Video generates high definition …

Muse: Text-to-image generation via masked generative transformers

H Chang, H Zhang, J Barber, AJ Maschinot… - arxiv preprint arxiv …, 2023 - arxiv.org
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image
generation performance while being significantly more efficient than diffusion or …

Diffusion self-guidance for controllable image generation

D Epstein, A Jabri, B Poole, A Efros… - Advances in Neural …, 2023 - proceedings.neurips.cc
Large-scale generative models are capable of producing high-quality images from detailed
prompts. However, many aspects of an image are difficult or impossible to convey through …

On distillation of guided diffusion models

C Meng, R Rombach, R Gao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Classifier-free guided diffusion models have recently been shown to be highly effective at
high-resolution image generation, and they have been widely used in large-scale diffusion …

ChatGPT is not all you need. A State of the Art Review of large Generative AI models

R Gozalo-Brizuela, EC Garrido-Merchan - arxiv preprint arxiv:2301.04655, 2023 - arxiv.org
During the last two years there has been a plethora of large generative models such as
ChatGPT or Stable Diffusion that have been published. Concretely, these models are able to …