Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

X He, D Jiang, G Zhang, M Ku, A Soni, S Siu… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …

A recipe for scaling up text-to-video generation with text-free videos

X Wang, S Zhang, H Yuan, Z Qing… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion-based text-to-video generation has witnessed impressive progress in the past year
yet still falls behind text-to-image generation. One of the key reasons is the limited scale of …

Video diffusion alignment via reward gradients

M Prabhudesai, R Mendonca, Z Qin… - arxiv preprint arxiv …, 2024 - arxiv.org
We have made significant progress towards building foundational video diffusion models. As
these models are trained using large-scale unsupervised data, it has become crucial to …

T2v-turbo-v2: Enhancing video generation model post-training through data, reward, and conditional guidance design

J Li, Q Long, J Zheng, X Gao, R Piramuthu… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the
post-training phase by distilling a highly capable consistency model from a pretrained T2V …

Alignment of diffusion models: Fundamentals, challenges, and future

B Liu, S Shao, B Li, L Bai, Z Xu, H **ong, J Kwok… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have emerged as the leading paradigm in generative modeling, excelling
in various applications. Despite their success, these models often misalign with human …

Animate-x: Universal character image animation with enhanced motion representation

S Tan, B Gong, X Wang, S Zhang, D Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Character image animation, which generates high-quality videos from a reference image
and target pose sequence, has seen significant progress in recent years. However, most …

Improving dynamic object interactions in text-to-video generation with ai feedback

H Furuta, H Zen, D Schuurmans, A Faust… - arxiv preprint arxiv …, 2024 - arxiv.org
Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …

Robust watermarking using generative priors against image editing: From benchmarking to advances

S Lu, Z Zhou, J Lu, Y Zhu, AWK Kong - arxiv preprint arxiv:2410.18775, 2024 - arxiv.org
Current image watermarking methods are vulnerable to advanced image editing techniques
enabled by large-scale text-to-image models. These models can distort embedded …

Videomaker: Zero-shot customized video generation with the inherent force of video diffusion models

T Wu, Y Zhang, X Cun, Z Qi, J Pu, H Dou… - arxiv preprint arxiv …, 2024 - arxiv.org
Zero-shot customized video generation has gained significant attention due to its substantial
application potential. Existing methods rely on additional models to extract and inject …

Boosting text-to-video generative model with MLLMs feedback

X Wu, S Huang, G Wang, J **ong… - The Thirty-eighth Annual …, 2024 - openreview.net
Recent advancements in text-to-video generative models, such as Sora, have showcased
impressive capabilities. These models have attracted significant interest for their potential …