Adversarial diffusion distillation

A Sauer, D Lorenz, A Blattmann… - European Conference on …, 2024 - Springer
Abstract We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that
efficiently samples large-scale foundational image diffusion models in just 1–4 steps while …

Analyzing and improving the training dynamics of diffusion models

T Karras, M Aittala, J Lehtinen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models currently dominate the field of data-driven image synthesis with their
unparalleled scaling to large datasets. In this paper we identify and rectify several causes for …

Guiding a diffusion model with a bad version of itself

T Karras, M Aittala, T Kynkäänniemi… - Advances in …, 2025 - proceedings.neurips.cc
The primary axes of interest in image-generating diffusion models are image quality, the
amount of variation in the results, and how well the results align with a given condition, eg, a …

Zigma: A dit-style zigzag mamba diffusion model

VT Hu, SA Baumann, M Gui, O Grebenkova… - … on Computer Vision, 2024 - Springer
The diffusion model has long been plagued by scalability and quadratic complexity issues,
especially within transformer-based structures. In this study, we aim to leverage the long …

Docci: Descriptions of connected and contrasting images

Y Onoe, S Rane, Z Berger, Y Bitton, J Cho… - … on Computer Vision, 2024 - Springer
Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T)
research. However, current datasets lack descriptions with fine-grained detail that would …

Applying guidance in a limited interval improves sample and distribution quality in diffusion models

T Kynkäänniemi, M Aittala, T Karras, S Laine… - arxiv preprint arxiv …, 2024 - arxiv.org
Guidance is a crucial technique for extracting the best performance out of image-generating
diffusion models. Traditionally, a constant guidance weight has been applied throughout the …

Genhowto: Learning to generate actions and state transformations from instructional videos

T Souček, D Damen, M Wray… - Proceedings of the …, 2024 - openaccess.thecvf.com
We address the task of generating temporally consistent and physically plausible images of
actions and object state transformations. Given an input image and a text prompt describing …

Blue noise for diffusion models

X Huang, C Salaun, C Vasconcelos… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
Most of the existing diffusion models use Gaussian noise for training and sampling across all
time steps, which may not optimally account for the frequency contents reconstructed by the …

Towards geographic inclusion in the evaluation of text-to-image models

M Hall, SJ Bell, C Ross, A Williams… - Proceedings of the …, 2024 - dl.acm.org
Rapid progress in text-to-image generative models coupled with their deployment for visual
content creation has magnified the importance of thoroughly evaluating their performance …

Conformal prediction sets improve human decision making

JC Cresswell, Y Sui, B Kumar, N Vouitsis - arxiv preprint arxiv:2401.13744, 2024 - arxiv.org
In response to everyday queries, humans explicitly signal uncertainty and offer alternative
answers when they are unsure. Machine learning models that output calibrated prediction …