Vlmevalkit: An open-source toolkit for evaluating large multi-modality models
We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …
Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics
Human perception of similarity across uni-and multimodal inputs is highly complex, making it
challenging to develop automated metrics that accurately mimic it. General purpose vision …
challenging to develop automated metrics that accurately mimic it. General purpose vision …
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
With the rising interest in research on Large Multi-modal Models (LMMs) for video
understanding, many studies have emphasized general video comprehension capabilities …
understanding, many studies have emphasized general video comprehension capabilities …
MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis
Artificial Intelligence (AI) has demonstrated significant capabilities in various fields, and in
areas such as human-computer interaction (HCI), embodied intelligence, and the design …
areas such as human-computer interaction (HCI), embodied intelligence, and the design …
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
F Yang, R Zhen, J Wang, Y Zhang, H Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
AIGC images are prevalent across various fields, yet they frequently suffer from quality
issues like artifacts and unnatural textures. Specialized models aim to predict defect region …
issues like artifacts and unnatural textures. Specialized models aim to predict defect region …
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Perceiving and generating diverse modalities are crucial for AI models to effectively learn
from and engage with real-world signals, necessitating reliable evaluations for their …
from and engage with real-world signals, necessitating reliable evaluations for their …
Redundancy Principles for MLLMs Benchmarks
With the rapid iteration of Multi-modality Large Language Models (MLLMs) and the evolving
demands of the field, the number of benchmarks produced annually has surged into the …
demands of the field, the number of benchmarks produced annually has surged into the …
MIXEVAL-X: ANY-TO-ANY EVALUATIONS FROM REAL-WORLD DATA MIXTURES
AE Steps, BL Zeppelin, C Queen - openreview.net
Perceiving and generating diverse modalities are crucial for AI models to effectively learn
from and engage with real-world signals, necessitating reliable evaluations for their …
from and engage with real-world signals, necessitating reliable evaluations for their …