PIXART-: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
In this paper, we introduce PixArt-Σ, a Diffusion Transformer model (DiT) capable of directly
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …
Diffit: Diffusion vision transformers for image generation
Diffusion models with their powerful expressivity and high sample quality have achieved
State-Of-The-Art (SOTA) performance in the generative domain. The pioneering Vision …
State-Of-The-Art (SOTA) performance in the generative domain. The pioneering Vision …
Representation alignment for generation: Training diffusion transformers is easier than you think
Recent studies have shown that the denoising process in (generative) diffusion models can
induce meaningful (discriminative) representations inside the model, though the quality of …
induce meaningful (discriminative) representations inside the model, though the quality of …
Efficient diffusion transformer with step-wise dynamic attention mediators
This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …
mechanisms of diffusion transformer models, particularly during the early stages of …
Multi-layer transformers gradient can be approximated in almost linear time
The computational complexity of the self-attention mechanism in popular transformer
architectures poses significant challenges for training and inference, and becomes the …
architectures poses significant challenges for training and inference, and becomes the …
Mora: Enabling generalist video generation via a multi-agent framework
Text-to-video generation has made significant strides, but replicating the capabilities of
advanced systems like OpenAI Sora remains challenging due to their closed-source nature …
advanced systems like OpenAI Sora remains challenging due to their closed-source nature …
On statistical rates and provably efficient criteria of latent diffusion transformers (dits)
We investigate the statistical and computational limits of latent Diffusion Transformers (DiTs)
under the low-dimensional linear latent space assumption. Statistically, we study the …
under the low-dimensional linear latent space assumption. Statistically, we study the …
Deep compression autoencoder for efficient high-resolution diffusion models
We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models
for accelerating high-resolution diffusion models. Existing autoencoder models have …
for accelerating high-resolution diffusion models. Existing autoencoder models have …
Efficient diffusion models: A comprehensive survey from principles to practices
Z Ma, Y Zhang, G Jia, L Zhao, Y Ma, M Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
As one of the most popular and sought-after generative models in the recent years, diffusion
models have sparked the interests of many researchers and steadily shown excellent …
models have sparked the interests of many researchers and steadily shown excellent …
Randomized autoregressive visual generation
This paper presents Randomized AutoRegressive modeling (RAR) for visual generation,
which sets a new state-of-the-art performance on the image generation task while …
which sets a new state-of-the-art performance on the image generation task while …