A survey on video diffusion models
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
Align your latents: High-resolution video synthesis with latent diffusion models
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding
excessive compute demands by training a diffusion model in a compressed lower …
excessive compute demands by training a diffusion model in a compressed lower …
Scaling up gans for text-to-image synthesis
The recent success of text-to-image synthesis has taken the world by storm and captured the
general public's imagination. From a technical standpoint, it also marked a drastic change in …
general public's imagination. From a technical standpoint, it also marked a drastic change in …
Animatediff: Animate your personalized text-to-image diffusion models without specific tuning
With the advance of text-to-image (T2I) diffusion models (eg, Stable Diffusion) and
corresponding personalization techniques such as DreamBooth and LoRA, everyone can …
corresponding personalization techniques such as DreamBooth and LoRA, everyone can …
Adversarial diffusion distillation
Abstract We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that
efficiently samples large-scale foundational image diffusion models in just 1–4 steps while …
efficiently samples large-scale foundational image diffusion models in just 1–4 steps while …
T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models
The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …
strong power of learning complex structures and meaningful semantics. However, relying …
Open-vocabulary panoptic segmentation with text-to-image diffusion models
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
Sdxl: Improving latent diffusion models for high-resolution image synthesis
We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to
previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone …
previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone …
Structure and content-guided video synthesis with diffusion models
P Esser, J Chiu, P Atighehchian… - Proceedings of the …, 2023 - openaccess.thecvf.com
Text-guided generative diffusion models unlock powerful image creation and editing tools.
Recent approaches that edit the content of footage while retaining structure require …
Recent approaches that edit the content of footage while retaining structure require …
Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation
Automatic 3D content creation has achieved rapid progress recently due to the availability of
pre-trained, large language models and image diffusion models, forming the emerging topic …
pre-trained, large language models and image diffusion models, forming the emerging topic …