Align your latents: High-resolution video synthesis with latent diffusion models

A Blattmann, R Rombach, H Ling… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding
excessive compute demands by training a diffusion model in a compressed lower …

Dynamicrafter: Animating open-domain images with video diffusion priors

J **ng, M **a, Y Zhang, H Chen, W Yu, H Liu… - … on Computer Vision, 2024 - Springer
Animating a still image offers an engaging visual experience. Traditional image animation
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …

Stable video diffusion: Scaling latent video diffusion models to large datasets

A Blattmann, T Dockhorn, S Kulal… - arxiv preprint arxiv …, 2023 - arxiv.org
We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of-
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …

DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving

X Wang, Z Zhu, G Huang, X Chen, J Zhu… - European Conference on …, 2024 - Springer
World models, especially in autonomous driving, are trending and drawing extensive
attention due to their capacity for comprehending driving environments. The established …

Mcvd-masked conditional video diffusion for prediction, generation, and interpolation

V Voleti, A Jolicoeur-Martineau… - Advances in neural …, 2022 - proceedings.neurips.cc
Video prediction is a challenging task. The quality of video frames from current state-of-the-
art (SOTA) generative models tends to be poor and generalization beyond the training data …

Simda: Simple diffusion adapter for efficient video generation

Z **ng, Q Dai, H Hu, Z Wu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …

Generative image dynamics

Z Li, R Tucker, N Snavely… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We present an approach to modeling an image-space prior on scene motion. Our prior is
learned from a collection of motion trajectories extracted from real video sequences …

Nüwa: Visual synthesis pre-training for neural visual world creation

C Wu, J Liang, L Ji, F Yang, Y Fang, D Jiang… - European conference on …, 2022 - Springer
This paper presents a unified multimodal pre-trained model called NÜWA that can generate
new or manipulate existing visual data (ie, images and videos) for various visual synthesis …

Model-based imitation learning for urban driving

A Hu, G Corrado, N Griffiths, Z Murez… - Advances in …, 2022 - proceedings.neurips.cc
An accurate model of the environment and the dynamic agents acting in it offers great
potential for improving motion planning. We present MILE: a Model-based Imitation …

Gaia-1: A generative world model for autonomous driving

A Hu, L Russell, H Yeo, Z Murez, G Fedoseev… - arxiv preprint arxiv …, 2023 - arxiv.org
Autonomous driving promises transformative improvements to transportation, but building
systems capable of safely navigating the unstructured complexity of real-world scenarios …