Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey on hallucination in large vision-language models
Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …
attention within the AI landscape for its practical implementation potential. However,`` …
From large language models to large multimodal models: A literature review
With the deepening of research on Large Language Models (LLMs), significant progress has
been made in recent years on the development of Large Multimodal Models (LMMs), which …
been made in recent years on the development of Large Multimodal Models (LMMs), which …
Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi
We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
Visual autoregressive modeling: Scalable image generation via next-scale prediction
Abstract We present Visual AutoRegressive modeling (VAR), a new generation paradigm
that redefines the autoregressive learning on images as coarse-to-fine" next-scale …
that redefines the autoregressive learning on images as coarse-to-fine" next-scale …
Sharegpt4video: Improving video understanding and generation with better captions
Abstract We present the ShareGPT4Video series, aiming to facilitate the video
understanding of large video-language models (LVLMs) and the video generation of text-to …
understanding of large video-language models (LVLMs) and the video generation of text-to …
Internvideo2: Scaling foundation models for multimodal video understanding
We introduce InternVideo2, a new family of video foundation models (ViFM) that achieve the
state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue. Our …
state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue. Our …
Are we on the right way for evaluating large vision-language models?
Large vision-language models (LVLMs) have recently achieved rapid progress, sparking
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …
Moe-llava: Mixture of experts for large vision-language models
Recent advances demonstrate that scaling Large Vision-Language Models (LVLMs)
effectively improves downstream task performances. However, existing scaling methods …
effectively improves downstream task performances. However, existing scaling methods …
Paligemma: A versatile 3b vlm for transfer
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m
vision encoder and the Gemma-2B language model. It is trained to be a versatile and …
vision encoder and the Gemma-2B language model. It is trained to be a versatile and …