Vision mamba: Efficient visual representation learning with bidirectional state space model L Zhu, B Liao, Q Zhang, X Wang, W Liu, X Wang arXiv preprint arXiv:2401.09417, 2024 | 1053 | 2024 |
Graformer: Graph-oriented transformer for 3d pose estimation W Zhao, W Wang, Y Tian Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 118 | 2022 |
Hivit: A simpler and more efficient design of hierarchical vision transformer X Zhang, Y Tian, L Xie, W Huang, Q Dai, Q Ye, Q Tian The Eleventh International Conference on Learning Representations, 2023 | 66 | 2023 |
Spatial transform decoupling for oriented object detection H Yu, Y Tian, Q Ye, Y Liu Proceedings of the AAAI Conference on Artificial Intelligence 38 (7), 6782-6790, 2024 | 34 | 2024 |
Discretization-aware architecture search Y Tian, C Liu, L Xie, Q Ye Pattern Recognition 120, 108186, 2021 | 31 | 2021 |
Hivit: Hierarchical vision transformer meets masked image modeling X Zhang, Y Tian, W Huang, Q Ye, Q Dai, L Xie, Q Tian arXiv preprint arXiv:2205.14949, 2022 | 30 | 2022 |
Adaptive linear span network for object skeleton detection C Liu, Y Tian, Z Chen, J Jiao, Q Ye IEEE transactions on image processing 30, 5096-5108, 2021 | 30 | 2021 |
Integrally pre-trained transformer pyramid networks Y Tian, L Xie, Z Wang, L Wei, X Zhang, J Jiao, Y Wang, Q Tian, Q Ye Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 29 | 2023 |
Graformer: Graph convolution transformer for 3d pose estimation W Zhao, Y Tian, Q Ye, J Jiao, W Wang arXiv preprint arXiv:2109.08364, 2021 | 28 | 2021 |
Vmamba: Visual state space model 2024 Y Liu, Y Tian, Y Zhao, H Yu, L Xie, Y Wang, Q Ye, Y Liu arXiv preprint arXiv:2401.10166, 2024 | 22 | 2024 |
Beyond masking: Demystifying token-based pre-training for vision transformers Y Tian, L Xie, J Fang, J Jiao, Q Tian Pattern Recognition, 111386, 2025 | 19 | 2025 |
Semantic-aware generation for self-supervised visual representation learning Y Tian, L Xie, X Zhang, J Fang, H Xu, W Huang, J Jiao, Q Tian, Q Ye arXiv preprint arXiv:2111.13163, 2021 | 11 | 2021 |
ChatterBox: Multi-round Multimodal Referring and Grounding Y Tian, T Ma, L Xie, J Qiu, X Tang, Y Zhang, J Jiao, Q Tian, Q Ye arXiv preprint arXiv:2401.13307, 2024 | 10 | 2024 |
vHeat: Building Vision Models upon Heat Conduction Z Wang, Y Liu, Y Liu, H Yu, Y Wang, Q Ye, Y Tian arXiv preprint arXiv:2405.16555, 2024 | 6 | 2024 |
Fast-iTPN: Integrally pre-trained transformer pyramid network with token migration Y Tian, L Xie, J Qiu, J Jiao, Y Wang, Q Tian, Q Ye IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 5 | 2024 |
Artemis: Towards Referential Understanding in Complex Videos J Qiu, Y Zhang, X Tang, L Xie, T Ma, P Yan, D Doermann, Q Ye, Y Tian arXiv preprint arXiv:2406.00258, 2024 | 5 | 2024 |
Qixiang Ye Y Tian, L Xie, X Zhang, J Fang, H Xu, W Huang, J Jiao, Q Tian Semantic-aware generation for self-supervised visual representation learning, 2021 | 5 | 2021 |
Genetic feature fusion for object skeleton detection Y Qiao, Y Tian, Y Liu, J Jiao Security and Communication Networks 2021 (1), 6621760, 2021 | 5 | 2021 |
Personalized Large Vision-Language Models C Pham, H Phan, D Doermann, Y Tian arXiv preprint arXiv:2412.17610, 2024 | | 2024 |
Exploring Complicated Search Spaces With Interleaving-Free Sampling Y Tian, L Xie, J Fang, J Jiao, Q Ye, Q Tian IEEE Transactions on Neural Networks and Learning Systems, 2024 | | 2024 |