Vlmevalkit: An open-source toolkit for evaluating large multi-modality models

H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu… - Proceedings of the …, 2024 - dl.acm.org
We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …

Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

S Ghazanfari, S Garg, N Flammarion… - arxiv preprint arxiv …, 2024 - arxiv.org
Human perception of similarity across uni-and multimodal inputs is highly complex, making it
challenging to develop automated metrics that accurately mimic it. General purpose vision …

Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs

Z Zhang, Z Jia, H Wu, C Li, Z Chen, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
With the rising interest in research on Large Multi-modal Models (LMMs) for video
understanding, many studies have emphasized general video comprehension capabilities …

MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis

Y Zhou, Z Zhang, J Cao, J Jia, Y Jiang, F Wen… - arxiv preprint arxiv …, 2024 - arxiv.org
Artificial Intelligence (AI) has demonstrated significant capabilities in various fields, and in
areas such as human-computer interaction (HCI), embodied intelligence, and the design …

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

F Yang, R Zhen, J Wang, Y Zhang, H Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
AIGC images are prevalent across various fields, yet they frequently suffer from quality
issues like artifacts and unnatural textures. Specialized models aim to predict defect region …

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

J Ni, Y Song, D Ghosal, B Li, DJ Zhang, X Yue… - arxiv preprint arxiv …, 2024 - arxiv.org
Perceiving and generating diverse modalities are crucial for AI models to effectively learn
from and engage with real-world signals, necessitating reliable evaluations for their …

Redundancy Principles for MLLMs Benchmarks

Z Zhang, X Zhao, X Fang, C Li, X Liu, X Min… - arxiv preprint arxiv …, 2025 - arxiv.org
With the rapid iteration of Multi-modality Large Language Models (MLLMs) and the evolving
demands of the field, the number of benchmarks produced annually has surged into the …

MIXEVAL-X: ANY-TO-ANY EVALUATIONS FROM REAL-WORLD DATA MIXTURES

AE Steps, BL Zeppelin, C Queen - openreview.net
Perceiving and generating diverse modalities are crucial for AI models to effectively learn
from and engage with real-world signals, necessitating reliable evaluations for their …