Foundation models for generalist medical artificial intelligence
The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI)
models is likely to usher in newfound capabilities in medicine. We propose a new paradigm …
models is likely to usher in newfound capabilities in medicine. We propose a new paradigm …
Diffusion models in vision: A survey
Denoising diffusion models represent a recent emerging topic in computer vision,
demonstrating remarkable results in the area of generative modeling. A diffusion model is a …
demonstrating remarkable results in the area of generative modeling. A diffusion model is a …
Adding conditional control to text-to-image diffusion models
We present ControlNet, a neural network architecture to add spatial conditioning controls to
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …
Scalable diffusion models with transformers
We explore a new class of diffusion models based on the transformer architecture. We train
latent diffusion models of images, replacing the commonly-used U-Net backbone with a …
latent diffusion models of images, replacing the commonly-used U-Net backbone with a …
Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation
Score distillation sampling (SDS) has shown great promise in text-to-3D generation by
distilling pretrained large-scale text-to-image diffusion models, but suffers from over …
distilling pretrained large-scale text-to-image diffusion models, but suffers from over …
Photorealistic text-to-image diffusion models with deep language understanding
We present Imagen, a text-to-image diffusion model with an unprecedented degree of
photorealism and a deep level of language understanding. Imagen builds on the power of …
photorealism and a deep level of language understanding. Imagen builds on the power of …
T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models
The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …
strong power of learning complex structures and meaningful semantics. However, relying …
Imagen video: High definition video generation with diffusion models
We present Imagen Video, a text-conditional video generation system based on a cascade
of video diffusion models. Given a text prompt, Imagen Video generates high definition …
of video diffusion models. Given a text prompt, Imagen Video generates high definition …
High-resolution image synthesis with latent diffusion models
By decomposing the image formation process into a sequential application of denoising
autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image …
autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image …
Multi-concept customization of text-to-image diffusion
While generative models produce high-quality images of concepts learned from a large-
scale database, a user often wishes to synthesize instantiations of their own concepts (for …
scale database, a user often wishes to synthesize instantiations of their own concepts (for …