Blink: Multimodal large language models can see but not perceive
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
Evaluating text-to-visual generation with image-to-text generation
Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …
challenging because of the lack of effective metrics and standardized benchmarks. For …
Explainable and interpretable multimodal large language models: A comprehensive survey
The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …
large language models (LLMs) and computer vision (CV) systems driving advancements in …
Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model
The revolutionary capabilities of large language models (LLMs) have paved the way for
multimodal large language models (MLLMs) and fostered diverse applications across …
multimodal large language models (MLLMs) and fostered diverse applications across …
Kangaroo: A powerful video-language model supporting long-context video input
Rapid advancements have been made in extending Large Language Models (LLMs) to
Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data …
Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data …
Mme-survey: A comprehensive survey on evaluation of multimodal llms
As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …
Models (MLLMs) have garnered increased attention from both industry and academia …
Automated evaluation of large vision-language models on self-driving corner cases
Large Vision-Language Models (LVLMs) have received widespread attention for advancing
the interpretable self-driving. Existing evaluations of LVLMs primarily focus on multi-faceted …
the interpretable self-driving. Existing evaluations of LVLMs primarily focus on multi-faceted …
Mmmu-pro: A more robust multi-discipline multimodal understanding benchmark
This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline
Multimodal Understanding and Reasoning (MMMU) benchmark. MMMU-Pro rigorously …
Multimodal Understanding and Reasoning (MMMU) benchmark. MMMU-Pro rigorously …
A survey on evaluation of multimodal large language models
J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …
system by integrating powerful Large Language Models (LLMs) with various modality …
[PDF][PDF] Vhelm: A holistic evaluation of vision language models
Current benchmarks for assessing vision-language models (VLMs) often focus on their
perception or problem-solving capabilities and neglect other critical aspects such as …
perception or problem-solving capabilities and neglect other critical aspects such as …