Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-
Language Models that significantly improves upon its predecessor, DeepSeek-VL, through …
Language Models that significantly improves upon its predecessor, DeepSeek-VL, through …
Apollo: An exploration of video understanding in large multimodal models
Despite the rapid integration of video perception capabilities into Large Multimodal Models
(LMMs), the underlying mechanisms driving their video understanding remain poorly …
(LMMs), the underlying mechanisms driving their video understanding remain poorly …
Minimax-01: Scaling foundation models with lightning attention
A Li, B Gong, B Yang, B Shan, C Liu, C Zhu… - arxiv preprint arxiv …, 2025 - arxiv.org
We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are
comparable to top-tier models while offering superior capabilities in processing longer …
comparable to top-tier models while offering superior capabilities in processing longer …
MEGA-Bench: Scaling multimodal evaluation to over 500 real-world tasks
We present MEGA-Bench, an evaluation suite that scales multimodal evaluation to over 500
real-world tasks, to address the highly heterogeneous daily use cases of end users. Our …
real-world tasks, to address the highly heterogeneous daily use cases of end users. Our …
Worldcuisines: A massive-scale benchmark for multilingual and multicultural visual question answering on global cuisines
Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly
in languages other than English and in underrepresented cultural contexts. To evaluate their …
in languages other than English and in underrepresented cultural contexts. To evaluate their …
Enabling harmonious human-machine interaction with visual-context augmented dialogue system: A review
The intelligent dialogue system, aiming at communicating with humans harmoniously with
natural language, is brilliant for promoting the advancement of human-machine interaction …
natural language, is brilliant for promoting the advancement of human-machine interaction …
TVBench: Redesigning Video-Language Evaluation
Large language models have demonstrated impressive performance when integrated with
vision models even enabling video understanding. However, evaluating these video models …
vision models even enabling video understanding. However, evaluating these video models …
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Current large multimodal models (LMMs) face significant challenges in processing and
comprehending long-duration or high-resolution videos, which is mainly due to the lack of …
comprehending long-duration or high-resolution videos, which is mainly due to the lack of …
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
In recent years, vision language models (VLMs) have made significant advancements in
video understanding. However, a crucial capability-fine-grained motion comprehension …
video understanding. However, a crucial capability-fine-grained motion comprehension …
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Perception and understanding are two pillars of computer vision. While multimodal large
language models (MLLM) have demonstrated remarkable visual understanding capabilities …
language models (MLLM) have demonstrated remarkable visual understanding capabilities …