A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?
As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …
everywhere because of its ability to analyze and create text, images, and beyond. With such …
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing
Despite the success in large-scale text-to-image generation and text-conditioned image
editing, existing methods still struggle to produce consistent generation and editing results …
editing, existing methods still struggle to produce consistent generation and editing results …
[HTML][HTML] Deep learning in food category recognition
Integrating artificial intelligence with food category recognition has been a field of interest for
research for the past few decades. It is potentially one of the next steps in revolutionizing …
research for the past few decades. It is potentially one of the next steps in revolutionizing …
[PDF][PDF] Scaling autoregressive models for content-rich text-to-image generation
Abstract We present the Pathways [1] Autoregressive Text-to-Image (Parti) model, which
generates high-fidelity photorealistic images and supports content-rich synthesis involving …
generates high-fidelity photorealistic images and supports content-rich synthesis involving …
Spatext: Spatio-textual representation for controllable image generation
Recent text-to-image diffusion models are able to generate convincing results of
unprecedented quality. However, it is nearly impossible to control the shapes of different …
unprecedented quality. However, it is nearly impossible to control the shapes of different …
Blended latent diffusion
The tremendous progress in neural image generation, coupled with the emergence of
seemingly omnipotent vision-language models has finally enabled text-based interfaces for …
seemingly omnipotent vision-language models has finally enabled text-based interfaces for …
Maxim: Multi-axis mlp for image processing
Recent progress on Transformers and multi-layer perceptron (MLP) models provide new
network architectural designs for computer vision tasks. Although these models proved to be …
network architectural designs for computer vision tasks. Although these models proved to be …
Blended diffusion for text-driven editing of natural images
Natural language offers a highly intuitive interface for image editing. In this paper, we
introduce the first solution for performing local (region-based) edits in generic natural …
introduce the first solution for performing local (region-based) edits in generic natural …
Vector quantized diffusion model for text-to-image synthesis
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …