Dreamllm: Synergistic multimodal comprehension and creation R Dong, C Han, Y Peng, Z Qi, Z Ge, J Yang, L Zhao, J Sun, H Zhou, H Wei, ... ICLR 2024 (Spotlight), 2023 | 162 | 2023 |
Task-specific inconsistency alignment for domain adaptive object detection L Zhao, L Wang CVPR 2022, 2022 | 112 | 2022 |
Vary: Scaling up the vision vocabulary for large vision-language model H Wei, L Kong, J Chen, L Zhao, Z Ge, J Yang, J Sun, C Han, X Zhang ECCV 2024, 2024 | 82 | 2024 |
Chatspot: Bootstrapping multimodal llms via precise referring instruction tuning L Zhao, E Yu, Z Ge, J Yang, H Wei, H Zhou, J Sun, Y Peng, R Dong, ... IJCAI 2024 (Long Oral), 2023 | 47 | 2023 |
Small Language Model Meets with Reinforced Vision Vocabulary H Wei, L Kong, J Chen, L Zhao, Z Ge, E Yu, J Sun, C Han, X Zhang arXiv preprint arXiv:2401.12503, 2024 | 33 | 2024 |
Unified density-aware image dehazing and object detection in real-world hazy scenes Z Zhang, L Zhao, Y Liu, S Zhang, J Yang ACCV 2020, 2020 | 31 | 2020 |
Merlin: Empowering multimodal llms with foresight minds E Yu, L Zhao, Y Wei, J Yang, D Wu, L Kong, H Wei, T Wang, Z Ge, ... ECCV 2024, 2024 | 24 | 2024 |
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model H Wei, C Liu, J Chen, J Wang, L Kong, Y Xu, Z Ge, L Zhao, J Sun, Y Peng, ... arXiv preprint arXiv:2409.01704, 2024 | 22 | 2024 |
Hybrid resolution network using edge guided region mutual information loss for human parsing Y Liu, L Zhao, S Zhang, J Yang ACM MM 2020, 2020 | 18 | 2020 |
Onechart: Purify the chart structural extraction via one auxiliary token J Chen, L Kong, H Wei, C Liu, Z Ge, L Zhao, J Sun, C Han, X Zhang ACM MM 2024 (Oral), 2024 | 17 | 2024 |
Focus Anywhere for Fine-grained Multi-page Document Understanding C Liu, H Wei, J Chen, L Kong, Z Ge, Z Zhu, L Zhao, J Sun, C Han, ... arXiv preprint arXiv:2405.14295, 2024 | 16 | 2024 |
Logit normalization for long-tail object detection L Zhao, Y Teng, L Wang IJCV 132 (6), 2114-2134, 2024 | 9 | 2024 |
Self-supervised visual preference alignment K Zhu, L Zhao, Z Ge, X Zhang ACM MM 2024 (Oral), 2024 | 7 | 2024 |
Slow Perception: Let's Perceive Geometric Figures Step-by-step H Wei, Y Yin, Y Li, J Wang, L Zhao, J Sun, Z Ge, X Zhang arXiv preprint arXiv:2412.20631, 2024 | 1 | 2024 |
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction A Huang, B Wu, B Wang, C Yan, C Hu, C Feng, F Tian, F Shen, J Li, ... arXiv preprint arXiv:2502.11946, 2025 | | 2025 |
Unhackable Temporal Rewarding for Scalable Video MLLMs E Yu, K Lin, L Zhao, Y Wei, Z Zhu, H Wei, J Sun, Z Ge, X Zhang, J Wang, ... ICLR 2025, 2025 | | 2025 |
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model G Ma, H Huang, K Yan, L Chen, N Duan, S Yin, C Wan, R Ming, X Song, ... arXiv preprint arXiv:2502.10248, 2025 | | 2025 |
PerPO: Perceptual Preference Optimization via Discriminative Rewarding Z Zhu, L Zhao, K Lin, J Yang, E Yu, C Liu, H Wei, J Sun, Z Ge, X Zhang arXiv preprint arXiv:2502.04371, 2025 | | 2025 |