PIXART-: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

J Chen, C Ge, E **e, Y Wu, L Yao, X Ren… - … on Computer Vision, 2024 - Springer
In this paper, we introduce PixArt-Σ, a Diffusion Transformer model (DiT) capable of directly
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …

Make a cheap scaling: A self-cascade diffusion model for higher-resolution adaptation

L Guo, Y He, H Chen, M **a, X Cun, Y Wang… - … on Computer Vision, 2024 - Springer
Diffusion models have proven to be highly effective in image and video generation;
however, they encounter challenges in the correct composition of objects when generating …

Fouriscale: A frequency perspective on training-free high-resolution image synthesis

L Huang, R Fang, A Zhang, G Song, S Liu… - … on Computer Vision, 2024 - Springer
In this study, we delve into the generation of high-resolution images from pre-trained
diffusion models, addressing persistent challenges, such as repetitive patterns and structural …

Accdiffusion: An accurate method for higher-resolution image generation

Z Lin, M Lin, M Zhao, R Ji - European Conference on Computer Vision, 2024 - Springer
This paper attempts to address the object repetition issue in patch-wise higher-resolution
image generation. We propose AccDiffusion, an accurate method for patch-wise higher …

Linfusion: 1 gpu, 1 minute, 16k image

S Liu, W Yu, Z Tan, X Wang - arxiv preprint arxiv:2409.02097, 2024 - arxiv.org
Modern diffusion models, particularly those utilizing a Transformer-based UNet for
denoising, rely heavily on self-attention operations to manage complex spatial relationships …

Inf-dit: Upsampling any-resolution image with memory-efficient diffusion transformer

Z Yang, H Jiang, W Hong, J Teng, W Zheng… - … on Computer Vision, 2024 - Springer
Diffusion models have shown remarkable performance in image generation in recent years.
However, due to a quadratic increase in memory during generating ultra-high-resolution …

Wired Perspectives: Multi-View Wire Art Embraces Generative AI

Z Qu, L Yang, H Zhang, T **ang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Creating multi-view wire art (MVWA) a static 3D sculpture with diverse interpretations from
different viewpoints is a complex task even for skilled artists. In response we present …

Freeenhance: Tuning-free image enhancement via content-consistent noising-and-denoising process

Y Luo, Y Zhang, Z Qiu, T Yao, Z Chen… - Proceedings of the …, 2024 - dl.acm.org
The emergence of text-to-image generation models has led to the recognition that image
enhancement, performed as post-processing, would significantly improve the visual quality …

PartCraft: Crafting Creative Objects by Parts

KW Ng, X Zhu, YZ Song, T **ang - European Conference on Computer …, 2024 - Springer
This paper propels creative control in generative visual AI by allowing users to “select”.
Departing from traditional text or sketch-based methods, we for the first time allow users to …