Transformers in vision: A survey
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …
vision community to study their application to computer vision problems. Among their salient …
A survey on contrastive self-supervised learning
Self-supervised learning has gained popularity because of its ability to avoid the cost of
annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as …
annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as …
Laion-5b: An open large-scale dataset for training next generation image-text models
C Schuhmann, R Beaumont, R Vencu… - Advances in …, 2022 - proceedings.neurips.cc
Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of
training on large amounts of noisy image-text data, without relying on expensive accurate …
training on large amounts of noisy image-text data, without relying on expensive accurate …
Photorealistic text-to-image diffusion models with deep language understanding
We present Imagen, a text-to-image diffusion model with an unprecedented degree of
photorealism and a deep level of language understanding. Imagen builds on the power of …
photorealism and a deep level of language understanding. Imagen builds on the power of …
High-resolution image synthesis with latent diffusion models
By decomposing the image formation process into a sequential application of denoising
autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image …
autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image …
[PDF][PDF] Scaling autoregressive models for content-rich text-to-image generation
Abstract We present the Pathways [1] Autoregressive Text-to-Image (Parti) model, which
generates high-fidelity photorealistic images and supports content-rich synthesis involving …
generates high-fidelity photorealistic images and supports content-rich synthesis involving …
Scaling up gans for text-to-image synthesis
The recent success of text-to-image synthesis has taken the world by storm and captured the
general public's imagination. From a technical standpoint, it also marked a drastic change in …
general public's imagination. From a technical standpoint, it also marked a drastic change in …
Multi-concept customization of text-to-image diffusion
While generative models produce high-quality images of concepts learned from a large-
scale database, a user often wishes to synthesize instantiations of their own concepts (for …
scale database, a user often wishes to synthesize instantiations of their own concepts (for …
Make-a-video: Text-to-video generation without text-video data
We propose Make-A-Video--an approach for directly translating the tremendous recent
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple …
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple …
Zero-shot text-to-image generation
Text-to-image generation has traditionally focused on finding better modeling assumptions
for training on a fixed dataset. These assumptions might involve complex architectures …
for training on a fixed dataset. These assumptions might involve complex architectures …