Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model X Dong, P Zhang, Y Zang, Y Cao, B Wang, L Ouyang, X Wei, S Zhang, ... arXiv preprint arXiv:2401.16420, 2024 | 234 | 2024 |
Sharegpt4video: Improving video understanding and generation with better captions L Chen, X Wei, J Li, X Dong, P Zhang, Y Zang, Z Chen, H Duan, Z Tang, ... Advances in Neural Information Processing Systems 37, 19472-19495, 2025 | 101 | 2025 |
Evaluating and improving tool-augmented computation-intensive math reasoning B Zhang, K Zhou, X Wei, X Zhao, J Sha, S Wang, JR Wen Advances in Neural Information Processing Systems 36, 23570-23589, 2023 | 32 | 2023 |
Mmdu: A multi-turn multi-image dialog understanding benchmark and instruction-tuning dataset for lvlms Z Liu, T Chu, Y Zang, X Wei, X Dong, P Zhang, Z Liang, Y Xiong, Y Qiao, ... arXiv preprint arXiv:2406.11833, 2024 | 28 | 2024 |
Internlm-xcomposer2. 5-omnilive: A comprehensive multimodal system for long-term streaming video and audio interactions P Zhang, X Dong, Y Cao, Y Zang, R Qian, X Wei, L Chen, Y Li, J Niu, ... arXiv preprint arXiv:2412.09596, 2024 | 4 | 2024 |
VideoRoPE: What Makes for Good Video Rotary Position Embedding? X Wei, X Liu, Y Zang, X Dong, P Zhang, Y Cao, J Tong, H Duan, Q Guo, ... arXiv preprint arXiv:2502.05173, 2025 | 1 | 2025 |