Svdqunat: Absorbing outliers by low-rank components for 4-bit diffusion models

M Li, Y Lin, Z Zhang, T Cai, X Li, J Guo, E **e… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have been proven highly effective at generating high-quality images.
However, as these models grow larger, they require significantly more memory and suffer …

Real-time video generation with pyramid attention broadcast

X Zhao, X **, K Wang, Y You - arxiv preprint arxiv:2408.12588, 2024 - arxiv.org
We present Pyramid Attention Broadcast (PAB), a real-time, high quality and training-free
approach for DiT-based video generation. Our method is founded on the observation that …

Lazydit: Lazy learning for the acceleration of diffusion transformers

X Shen, Z Song, Y Zhou, B Chen, Y Li, Y Gong… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion Transformers have emerged as the preeminent models for a wide array of
generative tasks, demonstrating superior performance and efficacy across various …

Accelerating diffusion transformers with token-wise feature caching

C Zou, X Liu, T Liu, S Huang, L Zhang - arxiv preprint arxiv:2410.05317, 2024 - arxiv.org
Diffusion transformers have shown significant effectiveness in both image and video
synthesis at the expense of huge computation costs. To address this problem, feature …

Unveiling Redundancy in Diffusion Transformers (DiTs): A Systematic Study

X Sun, J Fang, A Li, J Pan - arxiv preprint arxiv:2411.13588, 2024 - arxiv.org
The increased model capacity of Diffusion Transformers (DiTs) and the demand for
generating higher resolutions of images and videos have led to a significant rise in inference …

Layer-and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

H You, C Barnes, Y Zhou, Y Kang, Z Du… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation
quality but suffer from high latency and memory inefficiency, making them difficult to deploy …

CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models

X Cheng, Z Chen, Z Jia - arxiv preprint arxiv:2502.00433, 2025 - arxiv.org
Diffusion models have revolutionized generative tasks, especially in the domain of text-to-
image synthesis; however, their iterative denoising process demands substantial …

Effortless Efficiency: Low-Cost Pruning of Diffusion Models

Y Zhang, E **, Y Dong, A Khakzar, P Torr… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have achieved impressive advancements in various vision tasks. However,
these gains often rely on increasing model size, which escalates computational complexity …

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

F Liu, S Zhang, X Wang, Y Wei, H Qiu, Y Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
As a fundamental backbone for video generation, diffusion models are challenged by low
inference speed due to the sequential nature of denoising. Previous methods speed up the …

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

J Fang, J Pan, X Sun, A Li, J Wang - arxiv preprint arxiv:2411.01738, 2024 - arxiv.org
Diffusion models are pivotal for generating high-quality images and videos. Inspired by the
success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to …