Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey on video diffusion models
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
Is sora a world simulator? a comprehensive survey on general world models and beyond
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …
Sharegpt4video: Improving video understanding and generation with better captions
We present the ShareGPT4Video series, aiming to facilitate the video understanding of large
video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) …
video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) …
Miradata: A large-scale video dataset with long durations and structured captions
Sora's high-motion intensity and long consistent videos have significantly impacted the field
of video generation, attracting unprecedented attention. However, existing publicly available …
of video generation, attracting unprecedented attention. However, existing publicly available …
Streaming long video understanding with large language models
This paper presents VideoStreaming, an advanced vision-language large model (VLLM) for
video understanding, that capably understands arbitrary-length video with a constant …
video understanding, that capably understands arbitrary-length video with a constant …
Hourvideo: 1-hour video-language understanding
K Chandrasegaran, A Gupta… - Advances in …, 2025 - proceedings.neurips.cc
We present HourVideo, a benchmark dataset for hour-long video-language understanding.
Our dataset consists of a novel task suite comprising summarization, perception (recall …
Our dataset consists of a novel task suite comprising summarization, perception (recall …
Video instruction tuning with synthetic data
The development of video large multimodal models (LMMs) has been hindered by the
difficulty of curating large amounts of high-quality raw data from the web. To address this, we …
difficulty of curating large amounts of high-quality raw data from the web. To address this, we …
Open-sora: Democratizing efficient video production for all
Vision and language are the two foundational senses for humans, and they build up our
cognitive ability and intelligence. While significant breakthroughs have been made in AI …
cognitive ability and intelligence. While significant breakthroughs have been made in AI …
Kangaroo: A powerful video-language model supporting long-context video input
Rapid advancements have been made in extending Large Language Models (LLMs) to
Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data …
Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data …
A survey of multimodal large language model from a data-centric perspective
Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …
language models by integrating and processing data from multiple modalities, including text …