MMMU: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi X Yue, Y Ni, K Zhang, T Zheng, R Liu, G Zhang, S Stevens, D Jiang, ... CVPR 2024 Oral, 2024 | 590 | 2024 |
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion D Jiang, X Ren, BY Lin ACL 2023, 2023 | 260 | 2023 |
Mantis: Interleaved multi-image instruction tuning D Jiang, X He, H Zeng, C Wei, M Ku, Q Liu, W Chen TMLR 2024, 2024 | 80 | 2024 |
TIGERScore: Towards building explainable metric for all text generation tasks D Jiang, Y Li, G Zhang, W Huang, BY Lin, W Chen Transactions on Machine Learning Research 2024, 2024 | 42 | 2024 |
VieScore: Towards explainable metrics for conditional image synthesis evaluation M Ku, D Jiang, C Wei, X Yue, W Chen ACL 2024, 2023 | 36 | 2023 |
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Y Lu, D Jiang, W Chen, WY Wang, Y Choi, BY Lin NeurIPS 2024 Dataset and Benchmark Track, 2024 | 31* | 2024 |
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation X He, D Jiang, G Zhang, M Ku, A Soni, S Siu, H Chen, A Chandra, Z Jiang, ... EMNLP 2024, 2024 | 27 | 2024 |
GenAI Arena: An Open Evaluation Platform for Generative Models D Jiang, M Ku, T Li, Y Ni, S Sun, R Fan, W Chen NeurIPS 2024 Dataset and Benchmark Track, 2024 | 12 | 2024 |
MEGA-Bench: Scaling multimodal evaluation to over 500 real-world tasks J Chen, T Liang, S Siu, Z Wang, K Wang, Y Wang, Y Ni, W Zhu, Z Jiang, ... ICLR 2025, 2025 | 3 | 2025 |
ACECODER: Acing Coder RL via Automated Test-Case Synthesis H Zeng, D Jiang, H Wang, P Nie, X Chen, W Chen arXiv preprint arXiv:2502.01718, 2025 | 1 | 2025 |