A survey on video diffusion models
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
Evaluating text-to-visual generation with image-to-text generation
Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …
challenging because of the lack of effective metrics and standardized benchmarks. For …
Deepfake: definitions, performance metrics and standards, datasets, and a meta-review
Recent advancements in AI, especially deep learning, have contributed to a significant
increase in the creation of new realistic-looking synthetic media (video, image, and audio) …
increase in the creation of new realistic-looking synthetic media (video, image, and audio) …
Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation
The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …
development of automatic video metrics is lagging significantly behind. None of the existing …
Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation
We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to
evaluate the temporal and metamorphic capabilities of the T2V models (eg Sora and …
evaluate the temporal and metamorphic capabilities of the T2V models (eg Sora and …
Evaluating and Improving Compositional Text-to-Visual Generation
While text-to-visual models now produce photo-realistic images and videos they struggle
with compositional text prompts involving attributes relationships and higher-order …
with compositional text prompts involving attributes relationships and higher-order …
Subjective-aligned dataset and metric for text-to-video quality assessment
With the rapid development of generative models, AI-Generated Content (AIGC) has
exponentially increased in daily lives. Among them, Text-to-Video (T2V) generation has …
exponentially increased in daily lives. Among them, Text-to-Video (T2V) generation has …
Is sora a world simulator? a comprehensive survey on general world models and beyond
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …
Improving dynamic object interactions in text-to-video generation with ai feedback
Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …
applications. However, these models struggle to accurately depict dynamic object …
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …