Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Mm-llms: Recent advances in multimodal large language models
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
Mmbench: Is your multi-modal model an all-around player?
Large vision-language models (VLMs) have recently achieved remarkable progress,
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …
Minicpm-v: A gpt-4v level mllm on your phone
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally
reshaped the landscape of AI research and industry, shedding light on a promising path …
reshaped the landscape of AI research and industry, shedding light on a promising path …
Blink: Multimodal large language models can see but not perceive
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding
Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …
image and video understanding. However, they lack reasoning abilities and cannot be …
Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model
We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-
form text-image composition and comprehension. This model goes beyond conventional …
form text-image composition and comprehension. This model goes beyond conventional …
Mmt-bench: A comprehensive multimodal benchmark for evaluating large vision-language models towards multitask agi
Large Vision-Language Models (LVLMs) show significant strides in general-purpose
multimodal applications such as visual dialogue and embodied navigation. However …
multimodal applications such as visual dialogue and embodied navigation. However …
Moai: Mixture of all intelligence for large language and vision models
The rise of large language models (LLMs) and instruction tuning has led to the current trend
of instruction-tuned large language and vision models (LLVMs). This trend involves either …
of instruction-tuned large language and vision models (LLVMs). This trend involves either …
A survey on evaluation of multimodal large language models
J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …
system by integrating powerful Large Language Models (LLMs) with various modality …
Video-ccam: Enhancing video-language understanding with causal cross-attention masks for short and long videos
Multi-modal large language models (MLLMs) have demonstrated considerable potential
across various downstream tasks that require cross-domain knowledge. MLLMs capable of …
across various downstream tasks that require cross-domain knowledge. MLLMs capable of …