Vmamba: Visual state space model Y Liu, Y Tian, Y Zhao, H Yu, L Xie, Y Wang, Q Ye, J Jiao, Y Liu Advances in neural information processing systems 37, 103031-103063, 2025 | 1100 | 2025 |
Graformer: Graph-oriented transformer for 3d pose estimation W Zhao, W Wang, Y Tian Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 144* | 2022 |
Hivit: A simpler and more efficient design of hierarchical vision transformer X Zhang, Y Tian, L Xie, W Huang, Q Dai, Q Ye, Q Tian The Eleventh International Conference on Learning Representations, 2023 | 95* | 2023 |
Spatial transform decoupling for oriented object detection H Yu, Y Tian, Q Ye, Y Liu Proceedings of the AAAI Conference on Artificial Intelligence 38 (7), 6782-6790, 2024 | 36 | 2024 |
Discretization-aware architecture search Y Tian, C Liu, L Xie, Q Ye Pattern Recognition 120, 108186, 2021 | 32 | 2021 |
Adaptive linear span network for object skeleton detection C Liu, Y Tian, Z Chen, J Jiao, Q Ye IEEE transactions on image processing 30, 5096-5108, 2021 | 31 | 2021 |
Integrally pre-trained transformer pyramid networks Y Tian, L Xie, Z Wang, L Wei, X Zhang, J Jiao, Y Wang, Q Tian, Q Ye Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 29 | 2023 |
Beyond masking: Demystifying token-based pre-training for vision transformers Y Tian, L Xie, J Fang, J Jiao, Q Tian Pattern Recognition, 111386, 2025 | 23* | 2025 |
Vmamba: Visual state space model (2024) Y Liu, Y Tian, Y Zhao, H Yu, L Xie, Y Wang, Q Ye, Y Liu arXiv preprint arXiv:2401.10166, 2024 | 22 | 2024 |
Semantic-aware generation for self-supervised visual representation learning Y Tian, L Xie, X Zhang, J Fang, H Xu, W Huang, J Jiao, Q Tian, Q Ye arXiv preprint arXiv:2111.13163, 2021 | 11 | 2021 |
Chatterbox: Multi-round multimodal referring and grounding Y Tian, T Ma, L Xie, J Qiu, X Tang, Y Zhang, J Jiao, Q Tian, Q Ye arXiv preprint arXiv:2401.13307, 2024 | 10 | 2024 |
Fast-iTPN: Integrally pre-trained transformer pyramid network with token migration Y Tian, L Xie, J Qiu, J Jiao, Y Wang, Q Tian, Q Ye IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 8 | 2024 |
vheat: Building vision models upon heat conduction Z Wang, Y Liu, Y Liu, H Yu, Y Wang, Q Ye, Y Tian arXiv preprint arXiv:2405.16555, 2024 | 6 | 2024 |
Artemis: Towards referential understanding in complex videos J Qiu, Y Zhang, X Tang, L Xie, T Ma, P Yan, D Doermann, Q Ye, Y Tian The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 | 5 | 2024 |
Genetic feature fusion for object skeleton detection Y Qiao, Y Tian, Y Liu, J Jiao Security and Communication Networks 2021 (1), 6621760, 2021 | 5 | 2021 |
YOLOv12: Attention-Centric Real-Time Object Detectors Y Tian, Q Ye, D Doermann arXiv preprint arXiv:2502.12524, 2025 | | 2025 |
Personalized Large Vision-Language Models C Pham, H Phan, D Doermann, Y Tian arXiv preprint arXiv:2412.17610, 2024 | | 2024 |
Exploring Complicated Search Spaces With Interleaving-Free Sampling Y Tian, L Xie, J Fang, J Jiao, Q Ye, Q Tian IEEE Transactions on Neural Networks and Learning Systems, 2024 | | 2024 |
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension T Ma, L Xie, Y Tian, B Yang, Q Ye arXiv preprint arXiv:2406.11327, 2024 | | 2024 |
ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding T Ma, L Xie, Y Tian, B Yang, Y Zhang, D Doermann, Q Ye arXiv e-prints, arXiv: 2406.11327, 2024 | | 2024 |