Align your latents: High-resolution video synthesis with latent diffusion models

A Blattmann, R Rombach, H Ling… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding
excessive compute demands by training a diffusion model in a compressed lower …

Stable video diffusion: Scaling latent video diffusion models to large datasets

A Blattmann, T Dockhorn, S Kulal… - arxiv preprint arxiv …, 2023‏ - arxiv.org
We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of-
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …

Next-gpt: Any-to-any multimodal llm

S Wu, H Fei, L Qu, W Ji, TS Chua - Forty-first International …, 2024‏ - openreview.net
While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides,
they mostly fall prey to the limitation of only input-side multimodal understanding, without the …

Ai-generated content (aigc) for various data modalities: A survey

LG Foo, H Rahmani, J Liu - arxiv preprint arxiv:2308.14177, 2023‏ - arxiv.org
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and
other media using AI algorithms. Due to its wide range of applications and the demonstrated …

Dynamicrafter: Animating open-domain images with video diffusion priors

J **ng, M **a, Y Zhang, H Chen, W Yu, H Liu… - … on Computer Vision, 2024‏ - Springer
Animating a still image offers an engaging visual experience. Traditional image animation
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …

Conditional image-to-video generation with latent flow diffusion models

H Ni, C Shi, K Li, SX Huang… - Proceedings of the IEEE …, 2023‏ - openaccess.thecvf.com
Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video
starting from an image (eg, a person's face) and a condition (eg, an action class label like …

Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving

Y Wang, J He, L Fan, H Li, Y Chen… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
In autonomous driving predicting future events in advance and evaluating the foreseeable
risks empowers autonomous vehicles to plan their actions enhancing safety and efficiency …

[HTML][HTML] Diffusion probabilistic modeling for video generation

R Yang, P Srivastava, S Mandt - Entropy, 2023‏ - mdpi.com
Denoising diffusion probabilistic models are a promising new class of generative models
that mark a milestone in high-quality image generation. This paper showcases their ability to …

Generative image dynamics

Z Li, R Tucker, N Snavely… - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com
We present an approach to modeling an image-space prior on scene motion. Our prior is
learned from a collection of motion trajectories extracted from real video sequences …

Maskvit: Masked visual pre-training for video prediction

A Gupta, S Tian, Y Zhang, J Wu, R Martín-Martín… - arxiv preprint arxiv …, 2022‏ - arxiv.org
The ability to predict future visual observations conditioned on past observations and motor
commands can enable embodied agents to plan solutions to a variety of tasks in complex …