Sharegpt4v: Improving large multi-modal models with better captions L Chen, J Li, X Dong, P Zhang, C He, J Wang, F Zhao, D Lin ECCV 2024, 2024 | 449 | 2024 |
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu, X Dong, Y Zang, P Zhang, ... MMOpen 2024, 2024 | 234* | 2024 |
Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation L Chen, H Chen, Z Wei, X Jin, X Tan, Y Jin, E Chen CVPR 2022, 2022 | 179 | 2022 |
Are We on the Right Way for Evaluating Large Vision-Language Models? L Chen, J Li, X Dong, P Zhang, Y Zang, Z Chen, H Duan, J Wang, Y Qiao, ... NeurIPS 2024, 2024 | 142 | 2024 |
Sharegpt4video: Improving video understanding and generation with better captions L Chen, X Wei, J Li, X Dong, P Zhang, Y Zang, Z Chen, H Duan, B Lin, ... NeurIPS 2024, 2024 | 87 | 2024 |
Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation L Chen, Z Wei, X Jin, H Chen, M Zheng, K Chen, Y Jin NeurIPS 2022, 2022 | 45 | 2022 |
Freedrag: Point tracking is not you need for interactive point-based image editing P Ling, L Chen, P Zhang, H Chen, Y Jin CVPR 2024, 2023 | 36* | 2023 |
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation Z Wei, L Chen, Y Jin, X Ma, T Liu, P Lin, B Wang, H Chen, J Zheng CVPR 2024, 2023 | 29 | 2023 |
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Y Qiao, H Duan, X Fang, J Yang, L Chen, S Zhang, J Wang, D Lin, ... NeurIPS 2024, 2024 | 10 | 2024 |
Open-sora plan: Open-source large video generation model B Lin, Y Ge, X Cheng, Z Li, B Zhu, S Wang, X He, Y Ye, S Yuan, L Chen, ... arXiv preprint arXiv:2412.00131, 2024 | 9 | 2024 |
Disentangle then Parse: Night-time Semantic Segmentation with Illumination Disentanglement Z Wei, L Chen, T Tu, H Chen, P Ling, Y Jin ICCV 2023, 2023 | 9 | 2023 |
Internlm-xcomposer2. 5-omnilive: A comprehensive multimodal system for long-term streaming video and audio interactions P Zhang, X Dong, Y Cao, Y Zang, R Qian, X Wei, L Chen, Y Li, J Niu, ... arXiv preprint arXiv:2412.09596, 2024 | 2 | 2024 |
Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output P Zhang, X Dong, Y Zang, Y Cao, R Qian, L Chen, Q Guo, H Duan, ... arXiv preprint arXiv:2407.03320, 2024 | | 2024 |