Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning D Guo, D Yang, H Zhang, J Song, R Zhang, R Xu, Q Zhu, S Ma, P Wang, ... arXiv preprint arXiv:2501.12948, 2025 | 196 | 2025 |
Fast vision transformers with hilo attention Z Pan, J Cai, B Zhuang NeurIPS 2022 (Spotlight), 2022 | 195 | 2022 |
Scalable vision transformers with hierarchical pooling Z Pan, B Zhuang, J Liu, H He, J Cai Proceedings of the IEEE/cvf international conference on computer vision, 377-386, 2021 | 179 | 2021 |
Deepseek-v3 technical report A Liu, B Feng, B Xue, B Wang, B Wu, C Lu, C Zhao, C Deng, C Zhang, ... arXiv preprint arXiv:2412.19437, 2024 | 158 | 2024 |
Object-and-action aware model for visual language navigation Y Qi, Z Pan, S Zhang, A van den Hengel, Q Wu European conference on computer vision, 303-317, 2020 | 125 | 2020 |
Less is more: Pay less attention in vision transformers Z Pan, B Zhuang, H He, J Liu, J Cai Proceedings of the AAAI Conference on Artificial Intelligence 36 (2), 2035-2043, 2022 | 99 | 2022 |
The road to know-where: An object-and-room informed sequential bert for indoor vision-language navigation Y Qi, Z Pan, Y Hong, MH Yang, A Van Den Hengel, Q Wu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 88 | 2021 |
A Survey on Efficient Training of Transformers B Zhuang, J Liu, Z Pan, H He, Y Weng, C Shen IJCAI 2023, 2023 | 64 | 2023 |
Pruning self-attentions into convolutional layers in single path H He, J Cai, J Liu, Z Pan, J Zhang, D Tao, B Zhuang IEEE transactions on pattern analysis and machine intelligence 46 (5), 3910-3922, 2024 | 51 | 2024 |
An efficient spatio-temporal pyramid transformer for action detection Y Weng, Z Pan, M Han, X Chang, B Zhuang European Conference on Computer Vision, 358-375, 2022 | 37 | 2022 |
Ecoformer: Energy-saving attention with linear complexity J Liu, Z Pan, H He, J Cai, B Zhuang NeurIPS 2022 (Spotlight), 2022 | 35 | 2022 |
Minicache: Kv cache compression in depth dimension for large language models A Liu, J Liu, Z Pan, Y He, R Haffari, B Zhuang Advances in Neural Information Processing Systems 37, 139997-140031, 2025 | 33 | 2025 |
Stitchable Neural Networks Z Pan, J Cai, B Zhuang CVPR 2023 (Highlight), 2023 | 33 | 2023 |
Janus: Decoupling visual encoding for unified multimodal understanding and generation C Wu, X Chen, Z Wu, Y Ma, X Liu, Z Pan, W Liu, Z Xie, X Yu, C Ruan, ... arXiv preprint arXiv:2410.13848, 2024 | 28 | 2024 |
Dynamic Focus-aware Positional Queries for Semantic Segmentation H He, J Cai, Z Pan, J Liu, J Zhang, D Tao, B Zhuang CVPR 2023, 2022 | 22 | 2022 |
Mesa: A memory-saving training framework for transformers Z Pan, P Chen, H He, J Liu, J Cai, B Zhuang arXiv preprint arXiv:2111.11124, 2021 | 20 | 2021 |
T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching Z Pan, B Zhuang, DA Huang, W Nie, Z Yu, C Xiao, J Cai, A Anandkumar ICLR 2025, 2024 | 16 | 2024 |
Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding Z Wu, X Chen, Z Pan, X Liu, W Liu, D Dai, H Gao, Y Ma, C Wu, B Wang, ... arXiv preprint arXiv:2412.10302, 2024 | 14 | 2024 |
Janus-pro: Unified multimodal understanding and generation with data and model scaling X Chen, Z Wu, X Liu, Z Pan, W Liu, Z Xie, X Yu, C Ruan arXiv preprint arXiv:2501.17811, 2025 | 8 | 2025 |
Janusflow: Harmonizing autoregression and rectified flow for unified multimodal understanding and generation Y Ma, X Liu, X Chen, W Liu, C Wu, Z Wu, Z Pan, Z Xie, H Zhang, L Zhao, ... arXiv preprint arXiv:2411.07975, 2024 | 4 | 2024 |