A survey on video diffusion models

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Lavie: High-quality video generation with cascaded latent diffusion models

Y Wang, X Chen, X Ma, S Zhou, Z Huang… - International Journal of …, 2024 - Springer
This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging a
pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task …

Emo: Emote portrait alive generating expressive portrait videos with audio2video diffusion model under weak conditions

L Tian, Q Wang, B Zhang, L Bo - European Conference on Computer …, 2024 - Springer
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking
head video generation by focusing on the dynamic and nuanced relationship between audio …

Sparsectrl: Adding sparse controls to text-to-video diffusion models

Y Guo, C Yang, A Rao, M Agrawala, D Lin… - European Conference on …, 2024 - Springer
The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Open-sora: Democratizing efficient video production for all

Z Zheng, X Peng, T Yang, C Shen, S Li, H Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision and language are the two foundational senses for humans, and they build up our
cognitive ability and intelligence. While significant breakthroughs have been made in AI …

Diffusion model-based video editing: A survey

W Sun, RC Tu, J Liao, D Tao - arxiv preprint arxiv:2407.07111, 2024 - arxiv.org
The rapid development of diffusion models (DMs) has significantly advanced image and
video applications, making" what you want is what you see" a reality. Among these, video …

Tc4d: Trajectory-conditioned text-to-4d generation

S Bahmani, X Liu, W Yifan, I Skorokhodov… - … on Computer Vision, 2024 - Springer
Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using
supervision from pre-trained text-to-video models. However, existing representations, such …

Endora: Video Generation Models as Endoscopy Simulators

C Li, H Liu, Y Liu, BY Feng, W Li, X Liu, Z Chen… - … Conference on Medical …, 2024 - Springer
Generative models hold promise for revolutionizing medical education, robot-assisted
surgery, and data augmentation for machine learning. Despite progress in generating 2D …

Miradata: A large-scale video dataset with long durations and structured captions

X Ju, Y Gao, Z Zhang, Z Yuan, X Wang, A Zeng… - arxiv preprint arxiv …, 2024 - arxiv.org
Sora's high-motion intensity and long consistent videos have significantly impacted the field
of video generation, attracting unprecedented attention. However, existing publicly available …