A survey on evaluation of multimodal large language models

J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …

A survey on multimodal benchmarks: In the era of large ai models

L Li, G Chen, H Shi, J **ao, L Chen - arxiv preprint arxiv:2409.18142, 2024 - arxiv.org
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …

Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems

M Małkiński, S Pawlonka, J Mańdziuk - arxiv preprint arxiv:2411.01173, 2024 - arxiv.org
Abstract visual reasoning (AVR) encompasses a suite of tasks whose solving requires the
ability to discover common concepts underlying the set of pictures through an analogy …

Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task

M Vaishnav, T Tammet - arxiv preprint arxiv:2501.13620, 2025 - arxiv.org
Evaluating the reasoning capabilities of Vision-Language Models (VLMs) in complex visual
tasks provides valuable insights into their potential and limitations. In this work, we assess …

Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning

M Hersche, G Camposampiero, R Wattenhofer… - arxiv preprint arxiv …, 2024 - arxiv.org
This work compares large language models (LLMs) and neuro-symbolic approaches in
solving Raven's progressive matrices (RPM), a visual abstract reasoning test that involves …

The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

IR Galatzer-Levy, D Munday, J McGiffin, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
There is increasing interest in tracking the capabilities of general intelligence foundation
models. This study benchmarks leading large language models and vision language …