Foundation models for generalist medical artificial intelligence

M Moor, O Banerjee, ZSH Abad, HM Krumholz… - Nature, 2023 - nature.com
The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI)
models is likely to usher in newfound capabilities in medicine. We propose a new paradigm …

Diffusion models in vision: A survey

FA Croitoru, V Hondru, RT Ionescu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Denoising diffusion models represent a recent emerging topic in computer vision,
demonstrating remarkable results in the area of generative modeling. A diffusion model is a …

Adding conditional control to text-to-image diffusion models

L Zhang, A Rao, M Agrawala - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
We present ControlNet, a neural network architecture to add spatial conditioning controls to
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …

Scalable diffusion models with transformers

W Peebles, S **e - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
We explore a new class of diffusion models based on the transformer architecture. We train
latent diffusion models of images, replacing the commonly-used U-Net backbone with a …

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation

Z Wang, C Lu, Y Wang, F Bao, C Li… - Advances in Neural …, 2024 - proceedings.neurips.cc
Score distillation sampling (SDS) has shown great promise in text-to-3D generation by
distilling pretrained large-scale text-to-image diffusion models, but suffers from over …

Photorealistic text-to-image diffusion models with deep language understanding

C Saharia, W Chan, S Saxena, L Li… - Advances in neural …, 2022 - proceedings.neurips.cc
We present Imagen, a text-to-image diffusion model with an unprecedented degree of
photorealism and a deep level of language understanding. Imagen builds on the power of …

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

C Mou, X Wang, L **e, Y Wu, J Zhang, Z Qi… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …

Imagen video: High definition video generation with diffusion models

J Ho, W Chan, C Saharia, J Whang, R Gao… - arxiv preprint arxiv …, 2022 - arxiv.org
We present Imagen Video, a text-conditional video generation system based on a cascade
of video diffusion models. Given a text prompt, Imagen Video generates high definition …

High-resolution image synthesis with latent diffusion models

R Rombach, A Blattmann, D Lorenz… - Proceedings of the …, 2022 - openaccess.thecvf.com
By decomposing the image formation process into a sequential application of denoising
autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image …

Multi-concept customization of text-to-image diffusion

N Kumari, B Zhang, R Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
While generative models produce high-quality images of concepts learned from a large-
scale database, a user often wishes to synthesize instantiations of their own concepts (for …