State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Sdxl: Improving latent diffusion models for high-resolution image synthesis

D Podell, Z English, K Lacey, A Blattmann… - arxiv preprint arxiv …, 2023 - arxiv.org
We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to
previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone …

Adversarial diffusion distillation

A Sauer, D Lorenz, A Blattmann… - European Conference on …, 2024 - Springer
Abstract We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that
efficiently samples large-scale foundational image diffusion models in just 1–4 steps while …

Deepcache: Accelerating diffusion models for free

X Ma, G Fang, X Wang - … of the IEEE/CVF conference on …, 2024 - openaccess.thecvf.com
Diffusion models have recently gained unprecedented attention in the field of image
synthesis due to their remarkable generative capabilities. Notwithstanding their prowess …

Instaflow: One step is enough for high-quality diffusion-based text-to-image generation

X Liu, X Zhang, J Ma, J Peng - The Twelfth International …, 2023 - openreview.net
Diffusion models have revolutionized text-to-image generation with its exceptional quality
and creativity. However, its multi-step sampling process is known to be slow, often requiring …

Distrifusion: Distributed parallel inference for high-resolution diffusion models

M Li, T Cai, J Cao, Q Zhang, H Cai… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models have achieved great success in synthesizing high-quality images.
However generating high-resolution images with diffusion models is still challenging due to …

Snap video: Scaled spatiotemporal transformers for text-to-video synthesis

W Menapace, A Siarohin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contemporary models for generating images show remarkable quality and versatility.
Swayed by these advantages the research community repurposes them to generate videos …

Ufogen: You forward once large scale text-to-image generation via diffusion gans

Y Xu, Y Zhao, Z **ao, T Hou - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Text-to-image diffusion models have demonstrated remarkable capabilities in transforming
text prompts into coherent images yet the computational cost of the multi-step inference …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Mobilediffusion: Instant text-to-image generation on mobile devices

Y Zhao, Y Xu, Z **ao, H Jia, T Hou - European Conference on Computer …, 2024 - Springer
The deployment of large-scale text-to-image diffusion models on mobile devices is impeded
by their substantial model size and high latency. In this paper, we present MobileDiffusion …