Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi X Yue, Y Ni, K Zhang, T Zheng, R Liu, G Zhang, S Stevens, D Jiang, ... CVPR 2024 (Oral); Best Paper Candidate, 2024 | 542 | 2024 |
Mantis: Interleaved multi-image instruction tuning D Jiang, X He, H Zeng, C Wei, M Ku, Q Liu, W Chen TMLR 2024, 2024 | 71 | 2024 |
Consisti2v: Enhancing visual consistency for image-to-video generation W Ren, H Yang, G Zhang, C Wei, X Du, W Huang, W Chen TMLR 2024, 2024 | 49* | 2024 |
AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks M Ku*, C Wei*, W Ren*, H Yang, W Chen TMLR 2024 (Reproducibility Certification), 2024 | 34* | 2024 |
Viescore: Towards explainable metrics for conditional image synthesis evaluation M Ku, D Jiang, C Wei, X Yue, W Chen ACL, 2023 | 34 | 2023 |
Uniir: Training and benchmarking universal multimodal information retrievers C Wei, Y Chen, H Chen, H Hu, G Zhang, J Fu, A Ritter, W Chen ECCV 2025 (Oral), 2025 | 32 | 2025 |
Dreamedit: Subject-driven image editing T Li, M Ku*, C Wei*, W Chen TMLR, 2023 | 28 | 2023 |
Sparsifiner: Learning sparse instance-dependent attention for efficient vision transformers C Wei*, B Duke*, R Jiang, P Aarabi, GW Taylor, F Shkurti CVPR 2023, 2023 | 14 | 2023 |
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation W Ren, H Yang, J Min, C Wei, W Chen arXiv preprint arXiv:2412.00927, 2024 | | 2024 |
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision C Wei*, Z Xiong*, W Ren, X Du, G Zhang, W Chen ICLR 2025, 2024 | | 2024 |