Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation
The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …
development of automatic video metrics is lagging significantly behind. None of the existing …
A recipe for scaling up text-to-video generation with text-free videos
Diffusion-based text-to-video generation has witnessed impressive progress in the past year
yet still falls behind text-to-image generation. One of the key reasons is the limited scale of …
yet still falls behind text-to-image generation. One of the key reasons is the limited scale of …
Video diffusion alignment via reward gradients
M Prabhudesai, R Mendonca, Z Qin… - arxiv preprint arxiv …, 2024 - arxiv.org
We have made significant progress towards building foundational video diffusion models. As
these models are trained using large-scale unsupervised data, it has become crucial to …
these models are trained using large-scale unsupervised data, it has become crucial to …
T2v-turbo-v2: Enhancing video generation model post-training through data, reward, and conditional guidance design
In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the
post-training phase by distilling a highly capable consistency model from a pretrained T2V …
post-training phase by distilling a highly capable consistency model from a pretrained T2V …
Alignment of diffusion models: Fundamentals, challenges, and future
Diffusion models have emerged as the leading paradigm in generative modeling, excelling
in various applications. Despite their success, these models often misalign with human …
in various applications. Despite their success, these models often misalign with human …
Animate-x: Universal character image animation with enhanced motion representation
Character image animation, which generates high-quality videos from a reference image
and target pose sequence, has seen significant progress in recent years. However, most …
and target pose sequence, has seen significant progress in recent years. However, most …
Improving dynamic object interactions in text-to-video generation with ai feedback
Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …
applications. However, these models struggle to accurately depict dynamic object …
Robust watermarking using generative priors against image editing: From benchmarking to advances
Current image watermarking methods are vulnerable to advanced image editing techniques
enabled by large-scale text-to-image models. These models can distort embedded …
enabled by large-scale text-to-image models. These models can distort embedded …
Videomaker: Zero-shot customized video generation with the inherent force of video diffusion models
Zero-shot customized video generation has gained significant attention due to its substantial
application potential. Existing methods rely on additional models to extract and inject …
application potential. Existing methods rely on additional models to extract and inject …
Boosting text-to-video generative model with MLLMs feedback
Recent advancements in text-to-video generative models, such as Sora, have showcased
impressive capabilities. These models have attracted significant interest for their potential …
impressive capabilities. These models have attracted significant interest for their potential …