Filip: Fine-grained interactive language-image pre-training L Yao, R Huang, L Hou, G Lu, M Niu, H Xu, X Liang, Z Li, X Jiang, C Xu arXiv preprint arXiv:2111.07783, 2021 | 608 | 2021 |
PixArt-: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis J Chen, J Yu, C Ge, L Yao, E Xie, Y Wu, Z Wang, J Kwok, P Luo, H Lu, Z Li arXiv preprint arXiv:2310.00426, 2023 | 389 | 2023 |
Auto-fpn: Automatic network architecture adaptation for object detection beyond classification H Xu, L Yao, W Zhang, X Liang, Z Li Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 246 | 2019 |
Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection L Yao, J Han, Y Wen, X Liang, D Xu, W Zhang, Z Li, C Xu, H Xu Advances in Neural Information Processing Systems 35, 9125-9138, 2022 | 153 | 2022 |
Wukong: A 100 million large-scale chinese cross-modal pre-training benchmark J Gu, X Meng, G Lu, L Hou, N Minzhe, X Liang, L Yao, R Huang, W Zhang, ... Advances in Neural Information Processing Systems 35, 26418-26431, 2022 | 124 | 2022 |
PIXART-: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation J Chen, C Ge, E Xie, Y Wu, L Yao, X Ren, Z Wang, P Luo, H Lu, Z Li European Conference on Computer Vision, 74-91, 2024 | 105 | 2024 |
SM-NAS: Structural-to-modular neural architecture search for object detection L Yao, H Xu, W Zhang, X Liang, Z Li Proceedings of the AAAI conference on artificial intelligence 34 (07), 12661 …, 2020 | 86 | 2020 |
Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment L Yao, J Han, X Liang, D Xu, W Zhang, Z Li, H Xu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 82 | 2023 |
Detgpt: Detect what you need via reasoning R Pi, J Gao, S Diao, R Pan, H Dong, J Zhang, L Yao, J Han, H Xu, L Kong, ... arXiv preprint arXiv:2305.14167, 2023 | 78 | 2023 |
Difffit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning E Xie, L Yao, H Shi, Z Liu, D Zhou, Z Liu, J Li, Z Li Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 61 | 2023 |
Dit-3d: Exploring plain diffusion transformers for 3d shape generation S Mo, E Xie, R Chu, L Hong, M Niessner, Z Li Advances in neural information processing systems 36, 67960-67971, 2023 | 52 | 2023 |
G-detkd: Towards general distillation framework for object detectors via contrastive and semantic-guided feature imitation L Yao, R Pi, H Xu, W Zhang, Z Li, T Zhang Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 41* | 2021 |
Joint-detnas: Upgrade your detector with nas, pruning and dynamic distillation L Yao, R Pi, H Xu, W Zhang, Z Li, T Zhang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 39 | 2021 |
Perceptiongpt: Effectively fusing visual perception into llm R Pi, L Yao, J Gao, J Zhang, T Zhang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 19 | 2024 |
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection L Yao, R Pi, J Han, X Liang, H Xu, W Zhang, Z Li, D Xu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 6 | 2024 |
Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction R Pi, L Yao, J Han, X Liang, W Zhang, H Xu The Twelfth International Conference on Learning Representations, 0 | 2 | |
LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation J Wang, N Kang, L Yao, M Chen, C Wu, S Zhang, S Xue, Y Liu, T Wu, ... arXiv preprint arXiv:2501.12976, 2025 | | 2025 |
System and method for cross-modal interaction based on pre-trained model H Xu, HOU Lu, LU Guansong, NIU Minzhe, Z Li, R Huang, YAO Lewei, ... US Patent App. 17/900,592, 2024 | | 2024 |