A survey on video diffusion models
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
Sora: A review on background, technology, limitations, and opportunities of large vision models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …
model is trained to generate videos of realistic or imaginative scenes from text instructions …
Lavie: High-quality video generation with cascaded latent diffusion models
This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging a
pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task …
pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task …
Sv3d: Novel multi-view synthesis and 3d generation from a single image using latent video diffusion
Abstract We present Stable Video 3D (SV3D)—a latent video diffusion model for high-
resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent …
resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent …
Dynamicrafter: Animating open-domain images with video diffusion priors
Animating a still image offers an engaging visual experience. Traditional image animation
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …
Photorealistic video generation with diffusion models
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
Lumiere: A space-time diffusion model for video generation
We introduce Lumiere–a text-to-video diffusion model designed for synthesizing videos that
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …
Videopoet: A large language model for zero-shot video generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Fast high-resolution image synthesis with latent adversarial diffusion distillation
Diffusion models are the main driver of progress in image and video synthesis, but suffer
from slow inference speed. Distillation methods, like the recently introduced adversarial …
from slow inference speed. Distillation methods, like the recently introduced adversarial …