PIXART-: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
In this paper, we introduce PixArt-Σ, a Diffusion Transformer model (DiT) capable of directly
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …
Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining
We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …
vision and language tasks, particularly excelling in generating flexible photorealistic images …
Make a cheap scaling: A self-cascade diffusion model for higher-resolution adaptation
Diffusion models have proven to be highly effective in image and video generation;
however, they encounter challenges in the correct composition of objects when generating …
however, they encounter challenges in the correct composition of objects when generating …
Fouriscale: A frequency perspective on training-free high-resolution image synthesis
In this study, we delve into the generation of high-resolution images from pre-trained
diffusion models, addressing persistent challenges, such as repetitive patterns and structural …
diffusion models, addressing persistent challenges, such as repetitive patterns and structural …
Accdiffusion: An accurate method for higher-resolution image generation
This paper attempts to address the object repetition issue in patch-wise higher-resolution
image generation. We propose AccDiffusion, an accurate method for patch-wise higher …
image generation. We propose AccDiffusion, an accurate method for patch-wise higher …
Linfusion: 1 gpu, 1 minute, 16k image
Modern diffusion models, particularly those utilizing a Transformer-based UNet for
denoising, rely heavily on self-attention operations to manage complex spatial relationships …
denoising, rely heavily on self-attention operations to manage complex spatial relationships …
Inf-dit: Upsampling any-resolution image with memory-efficient diffusion transformer
Diffusion models have shown remarkable performance in image generation in recent years.
However, due to a quadratic increase in memory during generating ultra-high-resolution …
However, due to a quadratic increase in memory during generating ultra-high-resolution …
Wired Perspectives: Multi-View Wire Art Embraces Generative AI
Creating multi-view wire art (MVWA) a static 3D sculpture with diverse interpretations from
different viewpoints is a complex task even for skilled artists. In response we present …
different viewpoints is a complex task even for skilled artists. In response we present …
Freeenhance: Tuning-free image enhancement via content-consistent noising-and-denoising process
The emergence of text-to-image generation models has led to the recognition that image
enhancement, performed as post-processing, would significantly improve the visual quality …
enhancement, performed as post-processing, would significantly improve the visual quality …
PartCraft: Crafting Creative Objects by Parts
This paper propels creative control in generative visual AI by allowing users to “select”.
Departing from traditional text or sketch-based methods, we for the first time allow users to …
Departing from traditional text or sketch-based methods, we for the first time allow users to …