Evaluation of text-to-video generation models: A dynamics perspective

M Liao, Q Ye, W Zuo, F Wan, T Wang… - Advances in …, 2025 - proceedings.neurips.cc
Comprehensive and constructive evaluation protocols play an important role when
develo** sophisticated text-to-video (T2V) generation models. Existing evaluation …

Diffusion model-based video editing: A survey

W Sun, RC Tu, J Liao, D Tao - arxiv preprint arxiv:2407.07111, 2024 - arxiv.org
The rapid development of diffusion models (DMs) has significantly advanced image and
video applications, making" what you want is what you see" a reality. Among these, video …

T2vsafetybench: Evaluating the safety of text-to-video generative models

Y Miao, Y Zhu, L Yu, J Zhu, XS Gao… - Advances in Neural …, 2025 - proceedings.neurips.cc
The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along
with this comes the rising concern about its safety risks. The generated videos may contain …

Boosting text-to-video generative model with MLLMs feedback

X Wu, S Huang, G Wang, J **ong… - Advances in Neural …, 2025 - proceedings.neurips.cc
Recent advancements in text-to-video generative models, such as Sora, have showcased
impressive capabilities. These models have attracted significant interest for their potential …

Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

X He, D Jiang, G Zhang, M Ku, A Soni, S Siu… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …

Video diffusion models: A survey

A Melnik, M Ljubljanac, C Lu, Q Yan, W Ren… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion generative models have recently become a powerful technique for creating and
modifying high-quality, coherent video content. This survey provides a comprehensive …

A survey on multimodal wearable sensor-based human action recognition

J Ni, H Tang, ST Haque, Y Yan, AHH Ngu - arxiv preprint arxiv …, 2024 - arxiv.org
The combination of increased life expectancy and falling birth rates is resulting in an aging
population. Wearable Sensor-based Human Activity Recognition (WSHAR) emerges as a …

Align anything: Training all-modality models to follow instructions with language feedback

J Ji, J Zhou, H Lou, B Chen, D Hong, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Reinforcement learning from human feedback (RLHF) has proven effective in enhancing the
instruction-following capabilities of large language models; however, it remains …

Sakuga-42m dataset: Scaling up cartoon research

Z Pan - arxiv preprint arxiv:2405.07425, 2024 - arxiv.org
Hand-drawn cartoon animation employs sketches and flat-color segments to create the
illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive …

Videodpo: Omni-preference alignment for video diffusion generation

R Liu, H Wu, Z Ziqiang, C Wei, Y He, R Pi… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent progress in generative diffusion models has greatly advanced text-to-video
generation. While text-to-video models trained on large-scale, diverse datasets can produce …