Vlmevalkit: An open-source toolkit for evaluating large multi-modality models H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu, X Dong, Y Zang, P Zhang, ... Proceedings of the 32nd ACM International Conference on Multimedia, 11198-11201, 2024 | 39 | 2024 |
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding X Fang, K Mao, H Duan, X Zhao, Y Li, D Lin, K Chen arXiv preprint arXiv:2406.14515, 2024 | 29 | 2024 |
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection X Zhao, Y Chen, S Xu, X Li, X Wang, Y Li, H Huang arXiv preprint arXiv:2401.02361, 2024 | 23 | 2024 |
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning X Zhao, X Li, H Duan, H Huang, Y Li, K Chen, H Yang arXiv preprint arXiv:2406.17770, 2024 | 8 | 2024 |
Vlmevalkit: An open-source toolkit for evaluating large multi-modality models, 2024 H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu, X Dong, Y Zang, P Zhang, ... URL https://arxiv. org/abs/2407.11691, 0 | 8 | |
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Y Chen, X Li, Y Li, Y Zeng, J Wu, X Zhao, K Chen arXiv preprint arXiv:2406.20085, 2024 | 2 | 2024 |
Redundancy Principles for MLLMs Benchmarks Z Zhang, X Zhao, X Fang, C Li, X Liu, X Min, H Duan, K Chen, G Zhai arXiv preprint arXiv:2501.13953, 2025 | | 2025 |