Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the
performance of vision large language models (VLLMs), existing visual instruction tuning …
performance of vision large language models (VLLMs), existing visual instruction tuning …
Visual prompting in multimodal large language models: A survey
Multimodal large language models (MLLMs) equip pre-trained large-language models
(LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied …
(LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied …
Mmfuser: Multimodal multi-layer feature fuser for fine-grained vision-language understanding
Despite significant advancements in Multimodal Large Language Models (MLLMs) for
understanding complex human intentions through cross-modal interactions, capturing …
understanding complex human intentions through cross-modal interactions, capturing …
Task preference optimization: Improving multimodal large language models with vision task alignment
Current multimodal large language models (MLLMs) struggle with fine-grained or precise
understanding of visuals though they give comprehensive perception and reasoning in a …
understanding of visuals though they give comprehensive perception and reasoning in a …
Geoground: A unified large vision-language model. for remote sensing visual grounding
Remote sensing (RS) visual grounding aims to use natural language expression to locate
specific objects (in the form of the bounding box or segmentation mask) in RS images …
specific objects (in the form of the bounding box or segmentation mask) in RS images …
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
Large visual-language models (LVLMs) have achieved great success in multiple
applications. However, they still encounter challenges in complex scenes, especially those …
applications. However, they still encounter challenges in complex scenes, especially those …
Multimodal 3D Reasoning Segmentation with Complex Scenes
The recent development in multimodal learning has greatly advanced the research in 3D
scene understanding in various real-world tasks such as embodied AI. However, most …
scene understanding in various real-world tasks such as embodied AI. However, most …
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
Multi-modal large language models (MLLMs) have achieved remarkable success in fine-
grained visual understanding across a range of tasks. However, they often encounter …
grained visual understanding across a range of tasks. However, they often encounter …
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Driving World Models (DWMs) have become essential for autonomous driving by enabling
future scene prediction. However, existing DWMs are limited to scene generation and fail to …
future scene prediction. However, existing DWMs are limited to scene generation and fail to …
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Perception and understanding are two pillars of computer vision. While multimodal large
language models (MLLM) have demonstrated remarkable visual understanding capabilities …
language models (MLLM) have demonstrated remarkable visual understanding capabilities …