A survey on video diffusion models

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Sparsectrl: Adding sparse controls to text-to-video diffusion models

Y Guo, C Yang, A Rao, M Agrawala, D Lin… - European Conference on …, 2024 - Springer
The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Miradata: A large-scale video dataset with long durations and structured captions

X Ju, Y Gao, Z Zhang, Z Yuan… - Advances in …, 2025 - proceedings.neurips.cc
Sora's high-motion intensity and long consistent videos have significantly impacted the field
of video generation, attracting unprecedented attention. However, existing publicly available …

Tc4d: Trajectory-conditioned text-to-4d generation

S Bahmani, X Liu, W Yifan, I Skorokhodov… - … on Computer Vision, 2024 - Springer
Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using
supervision from pre-trained text-to-video models. However, existing representations, such …

Sf-v: Single forward video generation model

Z Zhang, Y Li, Y Wu, A Kag… - Advances in …, 2025 - proceedings.neurips.cc
Diffusion-based video generation models have demonstrated remarkable success in
obtaining high-fidelity videos through the iterative denoising process. However, these …

Vd3d: Taming large video diffusion transformers for 3d camera control

S Bahmani, I Skorokhodov, A Siarohin… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of
complex videos from a text description. However, most existing models lack fine-grained …

Leo: Generative latent image animator for human video synthesis

Y Wang, X Ma, X Chen, C Chen, A Dantcheva… - International Journal of …, 2024 - Springer
Spatio-temporal coherency is a major challenge in synthesizing high quality videos,
particularly in synthesizing human videos that contain rich global and local deformations. To …

Vbench++: Comprehensive and versatile benchmark suite for video generative models

Z Huang, F Zhang, X Xu, Y He, J Yu, Z Dong… - arxiv preprint arxiv …, 2024 - arxiv.org
Video generation has witnessed significant advancements, yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …