A survey on video diffusion models
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
Sora: A review on background, technology, limitations, and opportunities of large vision models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …
model is trained to generate videos of realistic or imaginative scenes from text instructions …
Lavie: High-quality video generation with cascaded latent diffusion models
This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging a
pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task …
pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task …
Stable video diffusion: Scaling latent video diffusion models to large datasets
We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of-
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …
Lumiere: A space-time diffusion model for video generation
We introduce Lumiere–a text-to-video diffusion model designed for synthesizing videos that
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …
Videopoet: A large language model for zero-shot video generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
Uni-controlnet: All-in-one control to text-to-image diffusion models
Text-to-Image diffusion models have made tremendous progress over the past two years,
enabling the generation of highly realistic images based on open-domain text descriptions …
enabling the generation of highly realistic images based on open-domain text descriptions …
Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models
Text-guided diffusion models have revolutionized image and video generation and have
also been successfully used for optimization-based 3D object synthesis. Here we instead …
also been successfully used for optimization-based 3D object synthesis. Here we instead …
Motionctrl: A unified and flexible motion controller for video generation
Motions in a video primarily consist of camera motion, induced by camera movement, and
object motion, resulting from object movement. Accurate control of both camera and object …
object motion, resulting from object movement. Accurate control of both camera and object …
Champ: Controllable and consistent human image animation with 3d parametric guidance
In this study, we introduce a methodology for human image animation by leveraging a 3D
human parametric model within a latent diffusion framework to enhance shape alignment …
human parametric model within a latent diffusion framework to enhance shape alignment …