Llava-onevision: Easy visual task transfer B Li, Y Zhang, D Guo, R Zhang, F Li, H Zhang, K Zhang, P Zhang, Y Li, ... arXiv preprint arXiv:2408.03326, 2024 | 245 | 2024 |
Long Context Transfer from Language to Vision P Zhang, K Zhang, B Li, G Zeng, J Yang, Y Zhang, Z Wang, H Tan, C Li, ... arXiv preprint arXiv:2406.16852, 2024 | 74* | 2024 |
Llava-next: Stronger llms supercharge multimodal capabilities in the wild B Li, K Zhang, H Zhang, D Guo, R Zhang, F Li, Y Zhang, Z Liu, C Li May, 2024 | 49 | 2024 |
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models K Zhang, B Li, P Zhang, F Pu, JA Cahyono, K Hu, S Liu, Y Zhang, J Yang, ... arXiv preprint arXiv:2407.12772, 2024 | 27 | 2024 |
Lmms-eval: Accelerating the development of large multimoal models B Li, P Zhang, K Zhang, F Pu, X Du, Y Dong, H Liu, Y Zhang, G Zhang, ... March, 2024 | 21* | 2024 |
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning Y Zhang, K Zhang, B Li, F Pu, CA Setiadharma, J Yang, Z Liu arXiv preprint arXiv:2405.03272, 2024 | 4 | 2024 |
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models K Zhang, Y Shen, B Li, Z Liu arXiv preprint arXiv:2411.14982, 2024 | | 2024 |
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures J Ni, Y Song, D Ghosal, B Li, DJ Zhang, X Yue, F Xue, Z Zheng, K Zhang, ... arXiv preprint arXiv:2410.13754, 2024 | | 2024 |