Fastcomposer: Tuning-free multi-subject image generation with localized attention

G **ao, T Yin, WT Freeman, F Durand… - International Journal of …, 2024 - Springer
Diffusion models excel at text-to-image generation, especially in subject-driven generation
for personalized images. However, existing methods are inefficient due to the subject …

Representation alignment for generation: Training diffusion transformers is easier than you think

S Yu, S Kwak, H Jang, J Jeong, J Huang, J Shin… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent studies have shown that the denoising process in (generative) diffusion models can
induce meaningful (discriminative) representations inside the model, though the quality of …

Lvcd: reference-based lineart video colorization with diffusion models

Z Huang, M Zhang, J Liao - ACM Transactions on Graphics (TOG), 2024 - dl.acm.org
We propose the first video diffusion framework for reference-based lineart video colorization.
Unlike previous works that rely solely on image generative models to colorize lineart frame …

Svdqunat: Absorbing outliers by low-rank components for 4-bit diffusion models

M Li, Y Lin, Z Zhang, T Cai, X Li, J Guo, E **e… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have been proven highly effective at generating high-quality images.
However, as these models grow larger, they require significantly more memory and suffer …

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference

Z Zhan, Y Wu, Y Gong, Z Meng, Z Kong, C Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid progress in artificial intelligence-generated content (AIGC), especially with
diffusion models, has significantly advanced development of high-quality video generation …

Diffusion Adversarial Post-Training for One-Step Video Generation

S Lin, X **a, Y Ren, C Yang, X **ao, L Jiang - arxiv preprint arxiv …, 2025 - arxiv.org
The diffusion models are widely used for image and video generation, but their iterative
generation process is slow and expansive. While existing distillation approaches have …

Layer-and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

H You, C Barnes, Y Zhou, Y Kang, Z Du… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation
quality but suffer from high latency and memory inefficiency, making them difficult to deploy …

From slow bidirectional to fast causal video generators

T Yin, Q Zhang, R Zhang, WT Freeman… - arxiv preprint arxiv …, 2024 - arxiv.org
Current video diffusion models achieve impressive generation quality but struggle in
interactive applications due to bidirectional attention dependencies. The generation of a …

Adversarial diffusion compression for real-world image super-resolution

B Chen, G Li, R Wu, X Zhang, J Chen, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Real-world image super-resolution (Real-ISR) aims to reconstruct high-resolution images
from low-resolution inputs degraded by complex, unknown processes. While many Stable …

InstantDrag: Improving Interactivity in Drag-based Image Editing

J Shin, D Choi, J Park - SIGGRAPH Asia 2024 Conference Papers, 2024 - dl.acm.org
Drag-based image editing has recently gained popularity for its interactivity and precision.
However, despite the ability of text-to-image models to generate samples within a second …