- Academic Search

J Chen, C Ge, E **e, Y Wu, L Yao, X Ren… - … on Computer Vision, 2024 - Springer

In this paper, we introduce PixArt-Σ, a Diffusion Transformer model (DiT) capable of directly
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …

Save Cite Cited by 105 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] openreview.net

Diffit: Diffusion vision transformers for image generation

A Hatamizadeh, J Song, G Liu, J Kautz… - European Conference on …, 2024 - Springer

Diffusion models with their powerful expressivity and high sample quality have achieved
State-Of-The-Art (SOTA) performance in the generative domain. The pioneering Vision …

Save Cite Cited by 48 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Representation alignment for generation: Training diffusion transformers is easier than you think

S Yu, S Kwak, H Jang, J Jeong, J Huang, J Shin… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent studies have shown that the denoising process in (generative) diffusion models can
induce meaningful (discriminative) representations inside the model, though the quality of …

Save Cite Cited by 23 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Efficient diffusion transformer with step-wise dynamic attention mediators

Y Pu, Z **a, J Guo, D Han, Q Li, D Li, Y Yuan… - … on Computer Vision, 2024 - Springer

This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …

Save Cite Cited by 8 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Multi-layer transformers gradient can be approximated in almost linear time

Y Liang, Z Sha, Z Shi, Z Song, Y Zhou - arxiv preprint arxiv:2408.13233, 2024 - arxiv.org

The computational complexity of the self-attention mechanism in popular transformer
architectures poses significant challenges for training and inference, and becomes the …

Save Cite Cited by 24 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Mora: Enabling generalist video generation via a multi-agent framework

Z Yuan, Y Liu, Y Cao, W Sun, H Jia, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Text-to-video generation has made significant strides, but replicating the capabilities of
advanced systems like OpenAI Sora remains challenging due to their closed-source nature …

Save Cite Cited by 19 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

On statistical rates and provably efficient criteria of latent diffusion transformers (dits)

JYC Hu, W Wu, Z Song, H Liu - arxiv preprint arxiv:2407.01079, 2024 - arxiv.org

We investigate the statistical and computational limits of latent Diffusion Transformers (DiTs)
under the low-dimensional linear latent space assumption. Statistically, we study the …

Save Cite Cited by 18 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Deep compression autoencoder for efficient high-resolution diffusion models

J Chen, H Cai, J Chen, E **e, S Yang, H Tang… - arxiv preprint arxiv …, 2024 - arxiv.org

We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models
for accelerating high-resolution diffusion models. Existing autoencoder models have …

Save Cite Cited by 9 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Efficient diffusion models: A comprehensive survey from principles to practices

Z Ma, Y Zhang, G Jia, L Zhao, Y Ma, M Ma… - arxiv preprint arxiv …, 2024 - arxiv.org

As one of the most popular and sought-after generative models in the recent years, diffusion
models have sparked the interests of many researchers and steadily shown excellent …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Randomized autoregressive visual generation

Q Yu, J He, X Deng, X Shen, LC Chen - arxiv preprint arxiv:2411.00776, 2024 - arxiv.org

This paper presents Randomized AutoRegressive modeling (RAR) for visual generation,
which sets a new state-of-the-art performance on the image generation task while …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers

PIXART-: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Diffit: Diffusion vision transformers for image generation

Representation alignment for generation: Training diffusion transformers is easier than you think

Efficient diffusion transformer with step-wise dynamic attention mediators

Multi-layer transformers gradient can be approximated in almost linear time

Mora: Enabling generalist video generation via a multi-agent framework

On statistical rates and provably efficient criteria of latent diffusion transformers (dits)

Deep compression autoencoder for efficient high-resolution diffusion models

Efficient diffusion models: A comprehensive survey from principles to practices

Randomized autoregressive visual generation