Uni-Perceiver: Pre-training unified architecture for generic perception for zero-shot and few-shot tasks X Zhu, J Zhu, H Li, X Wu, H Li, X Wang, J Dai CVPR 2022, 16804-16815, 2022 | 136 | 2022 |
Complementary relation contrastive distillation J Zhu, S Tang, D Chen, S Yu, Y Liu, M Rong, A Yang, X Wang CVPR 2021, 9260-9269, 2021 | 105 | 2021 |
Layerwise optimization by gradient decomposition for continual learning S Tang, D Chen, J Zhu, S Yu, W Ouyang CVPR 2021, 9634-9643, 2021 | 76 | 2021 |
Uni-Perceiver-MoE: Learning sparse generalist models with conditional moes J Zhu, X Zhu, W Wang, X Wang, H Li, X Wang, J Dai NeurIPS 2022, 2022 | 64 | 2022 |
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks H Li, J Zhu, X Jiang, X Zhu, H Li, C Yuan, X Wang, Y Qiao, X Wang, ... CVPR 2023, 2022 | 56 | 2022 |
SEED-X: Multimodal models with unified multi-granularity comprehension and generation Y Ge, S Zhao, J Zhu, Y Ge, K Yi, L Song, C Li, X Ding, Y Shan arXiv preprint arXiv:2404.14396, 2024 | 54 | 2024 |
A deep learning method to detect foreign objects for inspecting power transmission lines J Zhu, Y Guo, F Yue, H Yuan, A Yang, X Wang, M Rong Ieee Access 8, 94065-94075, 2020 | 41 | 2020 |
Vl-gpt: A generative pre-trained transformer for vision and language understanding and generation J Zhu, X Ding, Y Ge, Y Ge, S Zhao, H Zhao, X Wang, Y Shan arXiv preprint arXiv:2312.09251, 2023 | 29 | 2023 |
Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui, J Zhu, S Ye, H Tian, Z Liu, ... arXiv preprint arXiv:2412.05271, 2024 | 23 | 2024 |
Multiple domain experts collaborative learning: Multi-source domain generalization for person re-identification S Yu, F Zhu, D Chen, R Zhao, H Chen, S Tang, J Zhu, Y Qiao arXiv preprint arXiv:2105.12355, 2021 | 23 | 2021 |
Vlattack: Multimodal adversarial attacks on vision-language tasks via pre-trained models Z Yin, M Ye, T Zhang, T Du, J Zhu, H Liu, J Chen, T Wang, F Ma Advances in Neural Information Processing Systems 36, 2024 | 21 | 2024 |
Enhanced sensing of sulfur hexafluoride decomposition components based on noble-metal-functionalized cerium oxide A Yang, W Li, J Chu, D Wang, H Yuan, J Zhu, X Wang, M Rong Materials & Design 187, 108391, 2020 | 20 | 2020 |
Crowded human detection via an anchor-pair network J Zhu, Z Yuan, C Zhang, W Chi, Y Ling WACV 2020, 1391-1399, 2020 | 9 | 2020 |
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance Z Gao, Z Chen, E Cui, Y Ren, W Wang, J Zhu, H Tian, S Ye, J He, X Zhu, ... Visual Intelligence 2 (1), 1-17, 2024 | 8 | 2024 |
Enhancing the reasoning ability of multimodal large language models via mixed preference optimization W Wang, Z Chen, W Wang, Y Cao, Y Liu, Z Gao, J Zhu, X Zhu, L Lu, ... arXiv preprint arXiv:2411.10442, 2024 | 6 | 2024 |
Welding joints inspection via residual attention network J Zhu, Z Yuan, T Liu 2019 16th International Conference on Machine Vision Applications (MVA), 1-5, 2019 | 3 | 2019 |
Power-llava: Large language and vision assistant for power transmission line inspection J Wang, M Li, H Luo, J Zhu, A Yang, M Rong, X Wang 2024 IEEE International Conference on Image Processing (ICIP), 963-969, 2024 | 2 | 2024 |
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding H Li, C Tian, J Shao, X Zhu, Z Wang, J Zhu, W Dou, X Wang, H Li, L Lu, ... arXiv preprint arXiv:2412.09604, 2024 | 1 | 2024 |
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding J Ge, Z Chen, J Lin, J Zhu, X Liu, J Dai, X Zhu arXiv preprint arXiv:2412.09616, 2024 | | 2024 |
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning C Yang, X Zhu, J Zhu, W Su, J Wang, X Dong, W Wang, L Lu, B Li, J Zhou, ... arXiv preprint arXiv:2406.07543, 2024 | | 2024 |