Learning-to-cache: Accelerating diffusion transformer via layer caching
Diffusion Transformers have recently demonstrated unprecedented generative capabilities
for various tasks. The encouraging results, however, come with the cost of slow inference …
for various tasks. The encouraging results, however, come with the cost of slow inference …
Demystifying Singular Defects in Large Language Models
Large transformer models are known to produce high-norm tokens. In vision transformers
(ViTs), such tokens have been mathematically modeled through the singular vectors of the …
(ViTs), such tokens have been mathematically modeled through the singular vectors of the …
Unsupervised Model Tree Heritage Recovery
The number of models shared online has recently skyrocketed, with over one million public
models available on Hugging Face. Sharing models allows other users to build on existing …
models available on Hugging Face. Sharing models allows other users to build on existing …