How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui, W Tong, K Hu, J Luo, Z Ma, ... Science China Information Sciences 67 (12), 220101, 2024 | 348 | 2024 |
Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen, Q Zhang, Z Lai, Y Yang, ... arXiv preprint arXiv:2305.05662, 2023 | 88 | 2023 |
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World W Wang, M Shi, Q Li, W Wang, Z Huang, L Xing, Z Chen, H Li, X Zhu, ... The Twelfth International Conference on Learning Representations (ICLR 2024), 2023 | 78 | 2023 |
Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures Y Duan, W Wang, Z Chen, X Zhu, L Lu, T Lu, Y Qiao, H Li, J Dai, W Wang arXiv preprint arXiv:2403.02308, 2024 | 36 | 2024 |
Mm-interleaved: Interleaved image-text generative modeling via multi-modal feature synchronizer C Tian, X Zhu, Y Xiong, W Wang, Z Chen, W Wang, Y Chen, L Lu, T Lu, ... arXiv preprint arXiv:2401.10208, 2024 | 36 | 2024 |
The all-seeing project v2: Towards general relation comprehension of the open world W Wang, Y Ren, H Luo, T Li, C Yan, Z Chen, W Wang, Q Li, L Lu, X Zhu, ... The 18th European Conference on Computer Vision ECCV 2024, 2024 | 35 | 2024 |
Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui, J Zhu, S Ye, H Tian, Z Liu, ... arXiv preprint arXiv:2412.05271, 2024 | 22 | 2024 |
Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity Y Liu, Y Cao, Z Gao, W Wang, Z Chen, W Wang, H Tian, L Lu, X Zhu, T Lu, ... Science China Information Sciences 67 (12), 1-16, 2024 | 15 | 2024 |
Demystify transformers & convolutions in modern image deep networks X Hu, M Shi, W Wang, S Wu, L Xing, W Wang, X Zhu, L Lu, J Zhou, ... IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 14 | 2024 |
Needle In A Multimodal Haystack W Wang, S Zhang, Y Ren, Y Duan, T Li, S Liu, M Hu, Z Chen, K Zhang, ... arXiv preprint arXiv:2406.07230, 2024 | 11 | 2024 |
OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Q Li, Z Chen, W Wang, W Wang, S Ye, Z Jin, G Chen, Y He, Z Gao, E Cui, ... arXiv preprint arXiv:2406.08418, 2024 | 10 | 2024 |
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance Z Gao, Z Chen, E Cui, Y Ren, W Wang, J Zhu, H Tian, S Ye, J He, X Zhu, ... Visual Intelligence 2 (1), 1-17, 2024 | 8 | 2024 |
Internvl2: Better than the best—expanding performance boundaries of open-source multimodal models with the progressive scaling strategy | 8* | 2024 |
Cliptext: A new paradigm for zero-shot text classification L Qin, W Wang, Q Chen, W Che Findings of the Association for Computational Linguistics: ACL 2023, 1077-1088, 2023 | 6 | 2023 |
Enhancing the reasoning ability of multimodal large language models via mixed preference optimization W Wang, Z Chen, W Wang, Y Cao, Y Liu, Z Gao, J Zhu, X Zhu, L Lu, ... arXiv preprint arXiv:2411.10442, 2024 | 5 | 2024 |
Chemvlm: Exploring the power of multimodal large language models in chemistry area J Li, D Zhang, X Wang, Z Hao, J Lei, Q Tan, C Zhou, W Liu, Y Yang, ... arXiv preprint arXiv:2408.07246, 2024 | 4 | 2024 |
Seeing and understanding: Bridging vision with chemical knowledge via chemvlm J Li, D Zhang, X Wang, Z Hao, J Lei, Q Tan, C Zhou, W Liu, W Wang, ... arXiv e-prints, arXiv: 2408.07246, 2024 | 2 | 2024 |