Align your latents: High-resolution video synthesis with latent diffusion models
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding
excessive compute demands by training a diffusion model in a compressed lower …
excessive compute demands by training a diffusion model in a compressed lower …
Dynamicrafter: Animating open-domain images with video diffusion priors
Animating a still image offers an engaging visual experience. Traditional image animation
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …
Stable video diffusion: Scaling latent video diffusion models to large datasets
We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of-
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …
DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving
World models, especially in autonomous driving, are trending and drawing extensive
attention due to their capacity for comprehending driving environments. The established …
attention due to their capacity for comprehending driving environments. The established …
Mcvd-masked conditional video diffusion for prediction, generation, and interpolation
Video prediction is a challenging task. The quality of video frames from current state-of-the-
art (SOTA) generative models tends to be poor and generalization beyond the training data …
art (SOTA) generative models tends to be poor and generalization beyond the training data …
Simda: Simple diffusion adapter for efficient video generation
The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …
Generative image dynamics
We present an approach to modeling an image-space prior on scene motion. Our prior is
learned from a collection of motion trajectories extracted from real video sequences …
learned from a collection of motion trajectories extracted from real video sequences …
Nüwa: Visual synthesis pre-training for neural visual world creation
This paper presents a unified multimodal pre-trained model called NÜWA that can generate
new or manipulate existing visual data (ie, images and videos) for various visual synthesis …
new or manipulate existing visual data (ie, images and videos) for various visual synthesis …
Model-based imitation learning for urban driving
An accurate model of the environment and the dynamic agents acting in it offers great
potential for improving motion planning. We present MILE: a Model-based Imitation …
potential for improving motion planning. We present MILE: a Model-based Imitation …
Gaia-1: A generative world model for autonomous driving
Autonomous driving promises transformative improvements to transportation, but building
systems capable of safely navigating the unstructured complexity of real-world scenarios …
systems capable of safely navigating the unstructured complexity of real-world scenarios …