A survey on video diffusion models

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Align your latents: High-resolution video synthesis with latent diffusion models

A Blattmann, R Rombach, H Ling… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding
excessive compute demands by training a diffusion model in a compressed lower …

Video diffusion models

J Ho, T Salimans, A Gritsenko… - Advances in …, 2022 - proceedings.neurips.cc
Generating temporally coherent high fidelity video is an important milestone in generative
modeling research. We make progress towards this milestone by proposing a diffusion …

Preserve your own correlation: A noise prior for video diffusion models

S Ge, S Nah, G Liu, T Poon, A Tao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite tremendous progress in generating high-quality images using diffusion models,
synthesizing a sequence of animated frames that are both photorealistic and temporally …

Phenaki: Variable length video generation from open domain textual descriptions

R Villegas, M Babaeizadeh, PJ Kindermans… - International …, 2022 - openreview.net
We present Phenaki, a model capable of realistic video synthesis given a sequence of
textual prompts. Generating videos from text is particularly challenging due to the …

Videofusion: Decomposed diffusion models for high-quality video generation

Z Luo, D Chen, Y Zhang, Y Huang, L Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
A diffusion probabilistic model (DPM), which constructs a forward diffusion process by
gradually adding noise to data points and learns the reverse denoising process to generate …

Cogvideo: Large-scale pretraining for text-to-video generation via transformers

W Hong, M Ding, W Zheng, X Liu, J Tang - arxiv preprint arxiv:2205.15868, 2022 - arxiv.org
Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-
image (DALL-E and CogView) generation. Its application to video generation is still facing …

Show-1: Marrying pixel and latent diffusion models for text-to-video generation

DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu… - International Journal of …, 2024 - Springer
Significant advancements have been achieved in the realm of large-scale pre-trained text-to-
video Diffusion Models (VDMs). However, previous methods either rely solely on pixel …

Photorealistic video generation with diffusion models

A Gupta, L Yu, K Sohn, X Gu, M Hahn, FF Li… - … on Computer Vision, 2024 - Springer
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …