Learning-to-cache: Accelerating diffusion transformer via layer caching

X Ma, G Fang, M Bi Mi, X Wang - Advances in Neural …, 2025 - proceedings.neurips.cc
Diffusion Transformers have recently demonstrated unprecedented generative capabilities
for various tasks. The encouraging results, however, come with the cost of slow inference …

Demystifying Singular Defects in Large Language Models

H Wang, T Zhang, M Salzmann - arxiv preprint arxiv:2502.07004, 2025 - arxiv.org
Large transformer models are known to produce high-norm tokens. In vision transformers
(ViTs), such tokens have been mathematically modeled through the singular vectors of the …

Unsupervised Model Tree Heritage Recovery

E Horwitz, A Shul, Y Hoshen - The Thirteenth International Conference on … - openreview.net
The number of models shared online has recently skyrocketed, with over one million public
models available on Hugging Face. Sharing models allows other users to build on existing …