- Academic Search

H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu… - Proceedings of the …, 2024 - dl.acm.org

We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …

Save Cite Cited by 39 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

S Ghazanfari, S Garg, N Flammarion… - arxiv preprint arxiv …, 2024 - arxiv.org

Human perception of similarity across uni-and multimodal inputs is highly complex, making it
challenging to develop automated metrics that accurately mimic it. General purpose vision …

[Free GPT-4]

[PDF] arxiv.org

Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs

Z Zhang, Z Jia, H Wu, C Li, Z Chen, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

With the rising interest in research on Large Multi-modal Models (LMMs) for video
understanding, many studies have emphasized general video comprehension capabilities …

[Free GPT-4]

[PDF] arxiv.org

MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis

Y Zhou, Z Zhang, J Cao, J Jia, Y Jiang, F Wen… - arxiv preprint arxiv …, 2024 - arxiv.org

Artificial Intelligence (AI) has demonstrated significant capabilities in various fields, and in
areas such as human-computer interaction (HCI), embodied intelligence, and the design …

[Free GPT-4]

[PDF] arxiv.org

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

F Yang, R Zhen, J Wang, Y Zhang, H Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

AIGC images are prevalent across various fields, yet they frequently suffer from quality
issues like artifacts and unnatural textures. Specialized models aim to predict defect region …

[Free GPT-4]

[PDF] arxiv.org

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

J Ni, Y Song, D Ghosal, B Li, DJ Zhang, X Yue… - arxiv preprint arxiv …, 2024 - arxiv.org

Perceiving and generating diverse modalities are crucial for AI models to effectively learn
from and engage with real-world signals, necessitating reliable evaluations for their …

[Free GPT-4]

[PDF] arxiv.org

Redundancy Principles for MLLMs Benchmarks

Z Zhang, X Zhao, X Fang, C Li, X Liu, X Min… - arxiv preprint arxiv …, 2025 - arxiv.org

With the rapid iteration of Multi-modality Large Language Models (MLLMs) and the evolving
demands of the field, the number of benchmarks produced annually has surged into the …

[Free GPT-4]

[PDF] openreview.net

MIXEVAL-X: ANY-TO-ANY EVALUATIONS FROM REAL-WORLD DATA MIXTURES

AE Steps, BL Zeppelin, C Queen - openreview.net

Perceiving and generating diverse modalities are crucial for AI models to effectively learn
from and engage with real-world signals, necessitating reliable evaluations for their …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

ongkuo Min, Zijian Chen, aohong Liu, Weisi Lin, and Guangtao Zhai. A-bench: Are lmms...

Vlmevalkit: An open-source toolkit for evaluating large multi-modality models

Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs

MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Redundancy Principles for MLLMs Benchmarks

MIXEVAL-X: ANY-TO-ANY EVALUATIONS FROM REAL-WORLD DATA MIXTURES