Multi-task collaborative network for joint referring expression comprehension and segmentation G Luo, Y Zhou, X Sun, L Cao, C Wu, C Deng, R Ji Proceedings of the IEEE/CVF Conference on computer vision and pattern …, 2020 | 328 | 2020 |
Rstnet: Captioning with adaptive attention on visual and non-visual words X Zhang, X Sun, Y Luo, J Ji, Y Zhou, Y Wu, F Huang, R Ji Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 260 | 2021 |
SeqTR: A Simple yet Universal Network for Visual Grounding C Zhu, Y Zhou, Y Shen, G Luo, X Pan, M Lin, C Chen, L Cao, X Sun, R Ji European Conference on Computer Vision, 598–615, 2022 | 152 | 2022 |
Cascade grouped attention network for referring expression segmentation G Luo, Y Zhou, R Ji, X Sun, J Su, CW Lin, Q Tian Proceedings of the 28th ACM International Conference on Multimedia, 1274-1282, 2020 | 133 | 2020 |
Trar: Routing the attention spans in transformer for visual question answering Y Zhou, T Ren, C Zhu, X Sun, J Liu, X Ding, M Xu, R Ji Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 111 | 2021 |
Cheap and quick: Efficient vision-language instruction tuning for large language models G Luo, Y Zhou, T Ren, S Chen, X Sun, R Ji NeurIPS 2023, 2023 | 110 | 2023 |
Towards efficient visual adaption via structural re-parameterization G Luo, M Huang, Y Zhou, X Sun, G Jiang, Z Wang, R Ji arXiv preprint arXiv:2302.08106, 2023 | 78 | 2023 |
Active teacher for semi-supervised object detection P Mi, J Lin, Y Zhou, Y Shen, G Luo, X Sun, L Cao, R Fu, Q Xu, R Ji Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 75 | 2022 |
A real-time global inference network for one-stage referring expression comprehension Y Zhou, R Ji, G Luo, X Sun, J Su, X Ding, CW Lin, Q Tian IEEE Transactions on Neural Networks and Learning Systems 34 (1), 134-143, 2021 | 72 | 2021 |
Towards local visual modeling for image captioning Y Ma, J Ji, X Sun, Y Zhou, R Ji Pattern Recognition 138, 109420, 2023 | 71 | 2023 |
Make sharpness-aware minimization stronger: A sparsified perturbation approach P Mi, L Shen, T Ren, Y Zhou, X Sun, R Ji, D Tao Advances in Neural Information Processing Systems 35, 30950-30962, 2022 | 68 | 2022 |
Difnet: Boosting visual information flow for image captioning M Wu, X Zhang, X Sun, Y Zhou, C Chen, J Gu, X Sun, R Ji Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 53 | 2022 |
Towards lightweight transformer via group-wise transformation for vision-and-language tasks G Luo, Y Zhou, X Sun, Y Wang, L Cao, Y Wu, F Huang, R Ji IEEE Transactions on Image Processing 31, 3386-3398, 2022 | 50 | 2022 |
Dynamic capsule attention for visual question answering Y Zhou, R Ji, J Su, X Sun, W Chen Proceedings of the AAAI conference on artificial intelligence 33 (01), 9324-9331, 2019 | 47 | 2019 |
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models G Luo, Y Zhou, Y Zhang, X Zheng, X Sun, R Ji arXiv preprint arXiv:2403.03003, 2024 | 45 | 2024 |
Knowing what to learn: a metric-oriented focal mechanism for image captioning J Ji, Y Ma, X Sun, Y Zhou, Y Wu, R Ji IEEE Transactions on Image Processing 31, 4321-4335, 2022 | 42 | 2022 |
Survey of visual sentiment prediction for social media analysis R Ji, D Cao, Y Zhou, F Chen Frontiers of Computer Science 10, 602-611, 2016 | 34 | 2016 |
Knowledge-driven generative adversarial network for text-to-image synthesis J Peng, Y Zhou, X Sun, L Cao, Y Wu, F Huang, R Ji IEEE Transactions on Multimedia 24, 4356-4366, 2021 | 31 | 2021 |
K-armed bandit based multi-modal network architecture search for visual question answering Y Zhou, R Ji, X Sun, G Luo, X Hong, J Su, X Ding, L Shao Proceedings of the 28th ACM international conference on multimedia, 1245-1254, 2020 | 26 | 2020 |
Knowing what it is: semantic-enhanced dual attention transformer Y Ma, J Ji, X Sun, Y Zhou, Y Wu, F Huang, R Ji IEEE Transactions on Multimedia 25, 3723-3736, 2022 | 25 | 2022 |