Vlmevalkit: An open-source toolkit for evaluating large multi-modality models H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu, X Dong, Y Zang, P Zhang, ... Proceedings of the 32nd ACM international conference on multimedia, 11198-11201, 2024 | 49 | 2024 |
Mathbench: Evaluating the theory and application proficiency of llms with a hierarchical mathematics benchmark H Liu, Z Zheng, Y Qiao, H Duan, Z Fei, F Zhou, W Zhang, S Zhang, D Lin, ... arXiv preprint arXiv:2405.12209, 2024 | 32 | 2024 |
Prism: A framework for decoupling and assessing the capabilities of vlms Y Qiao, H Duan, X Fang, J Yang, L Chen, S Zhang, J Wang, D Lin, ... Advances in Neural Information Processing Systems 37, 111863-111898, 2025 | 13 | 2025 |