A survey on video diffusion models

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Videocrafter2: Overcoming data limitations for high-quality video diffusion models

H Chen, Y Zhang, X Cun, M **a… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-video generation aims to produce a video based on a given prompt. Recently
several commercial video models have been able to generate plausible videos with minimal …

Pix2video: Video editing using image diffusion

D Ceylan, CHP Huang, NJ Mitra - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image diffusion models, trained on massive image collections, have emerged as the most
versatile image generator model in terms of quality and diversity. They support inverting real …

Stablevideo: Text-driven consistency-aware diffusion video editing

W Chai, X Guo, G Wang, Y Lu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Diffusion-based methods can generate realistic images and videos, but they struggle to edit
existing objects in a video while preserving their geometry over time. This prevents diffusion …

Photorealistic video generation with diffusion models

A Gupta, L Yu, K Sohn, X Gu, M Hahn, FF Li… - … on Computer Vision, 2024 - Springer
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

Latte: Latent diffusion transformer for video generation

X Ma, Y Wang, G Jia, X Chen, Z Liu, YF Li… - arxiv preprint arxiv …, 2024 - arxiv.org
We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte
first extracts spatio-temporal tokens from input videos and then adopts a series of …

Latent video diffusion models for high-fidelity long video generation

Y He, T Yang, Y Zhang, Y Shan, Q Chen - arxiv preprint arxiv:2211.13221, 2022 - arxiv.org
AI-generated content has attracted lots of attention recently, but photo-realistic video
synthesis is still challenging. Although many attempts using GANs and autoregressive …

Motiondirector: Motion customization of text-to-video diffusion models

R Zhao, Y Gu, JZ Wu, DJ Zhang, JW Liu, W Wu… - … on Computer Vision, 2024 - Springer
Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse
video generations. Given a set of video clips of the same motion concept, the task of Motion …