A survey on video diffusion models

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Sora: A review on background, technology, limitations, and opportunities of large vision models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

Pix2video: Video editing using image diffusion

D Ceylan, CHP Huang, NJ Mitra - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image diffusion models, trained on massive image collections, have emerged as the most
versatile image generator model in terms of quality and diversity. They support inverting real …

Photorealistic video generation with diffusion models

A Gupta, L Yu, K Sohn, X Gu, M Hahn, FF Li… - … on Computer Vision, 2024 - Springer
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …

Stablevideo: Text-driven consistency-aware diffusion video editing

W Chai, X Guo, G Wang, Y Lu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Diffusion-based methods can generate realistic images and videos, but they struggle to edit
existing objects in a video while preserving their geometry over time. This prevents diffusion …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

Motiondirector: Motion customization of text-to-video diffusion models

R Zhao, Y Gu, JZ Wu, DJ Zhang, JW Liu, W Wu… - … on Computer Vision, 2024 - Springer
Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse
video generations. Given a set of video clips of the same motion concept, the task of Motion …

Videocrafter2: Overcoming data limitations for high-quality video diffusion models

H Chen, Y Zhang, X Cun, M **a… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-video generation aims to produce a video based on a given prompt. Recently
several commercial video models have been able to generate plausible videos with minimal …

Generative image dynamics

Z Li, R Tucker, N Snavely… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We present an approach to modeling an image-space prior on scene motion. Our prior is
learned from a collection of motion trajectories extracted from real video sequences …

Zigma: A dit-style zigzag mamba diffusion model

VT Hu, SA Baumann, M Gui, O Grebenkova… - … on Computer Vision, 2024 - Springer
The diffusion model has long been plagued by scalability and quadratic complexity issues,
especially within transformer-based structures. In this study, we aim to leverage the long …