A survey on video diffusion models

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Sora: A review on background, technology, limitations, and opportunities of large vision models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Dreamvideo: Composing your dream videos with customized subject and motion

Y Wei, S Zhang, Z Qing, H Yuan, Z Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Customized generation using diffusion models has made impressive progress in image
generation but remains unsatisfactory in the challenging video generation task as it requires …

Direct-a-video: Customized video generation with user-directed camera movement and object motion

S Yang, L Hou, H Huang, C Ma, P Wan… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
Recent text-to-video diffusion models have achieved impressive progress. In practice, users
often desire the ability to control object motion and camera movement independently for …

Ccedit: Creative and controllable video editing via diffusion models

R Feng, W Weng, Y Wang, Y Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we present CCEdit a versatile generative video editing framework based on
diffusion models. Our approach employs a novel trident network structure that separates …

Space-time diffusion features for zero-shot text-driven motion transfer

D Yatim, R Fridman, O Bar-Tal… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a new method for text-driven motion transfer-synthesizing a video that complies
with an input text prompt describing the target objects and scene while maintaining an input …

Videobooth: Diffusion-based video generation with image prompts

Y Jiang, T Wu, S Yang, C Si, D Lin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-driven video generation witnesses rapid progress. However merely using text prompts
is not enough to depict the desired subject appearance that accurately aligns with users' …

Customize-a-video: One-shot motion customization of text-to-video diffusion models

Y Ren, Y Zhou, J Yang, J Shi, D Liu, F Liu… - … on Computer Vision, 2024 - Springer
Image customization has been extensively studied in text-to-image (T2I) diffusion models,
leading to impressive outcomes and applications. With the emergence of text-to-video (T2V) …