A survey on video diffusion models

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2023 - proceedings.neurips.cc
The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

A survey of multimodal-guided image editing with text-to-image diffusion models

X Shuai, H Ding, X Ma, R Tu, YG Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Image editing aims to edit the given synthetic or real image to meet the specific requirements
from users. It is widely studied in recent years as a promising and challenging field of …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

Video-p2p: Video editing with cross-attention control

S Liu, Y Zhang, W Li, Z Lin, J Jia - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Video-P2P is the first framework for real-world video editing with cross-attention control.
While attention control has proven effective for image editing with pre-trained image …

Controlvideo: Training-free controllable text-to-video generation

Y Zhang, Y Wei, D Jiang, X Zhang, W Zuo… - arxiv preprint arxiv …, 2023 - arxiv.org
Text-driven diffusion models have unlocked unprecedented abilities in image generation,
whereas their video counterpart still lags behind due to the excessive training cost of …

Rerender a video: Zero-shot text-guided video-to-video translation

S Yang, Y Zhou, Z Liu, CC Loy - SIGGRAPH Asia 2023 Conference …, 2023 - dl.acm.org
Large text-to-image diffusion models have exhibited impressive proficiency in generating
high-quality images. However, when applying these models to video domain, ensuring …

Codef: Content deformation fields for temporally consistent video processing

H Ouyang, Q Wang, Y **ao, Q Bai… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present the content deformation field (CoDeF) as a new type of video representation
which consists of a canonical content field aggregating the static contents in the entire video …

Dragondiffusion: Enabling drag-style manipulation on diffusion models

C Mou, X Wang, J Song, Y Shan, J Zhang - arxiv preprint arxiv …, 2023 - arxiv.org
Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality
images from detailed textual descriptions, they often lack the ability to precisely edit the …