Uniformer: Unifying convolution and self-attention for visual recognition K Li, Y Wang, J Zhang, P Gao, G Song, Y Liu, H Li, Y Qiao IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023 | 744* | 2023 |
Videochat: Chat-centric video understanding KC Li, Y He, Y Wang, Y Li, W Wang, P Luo, Y Wang, L Wang, Y Qiao SCIENCE CHINA Information Sciences, 2023 | 587 | 2023 |
Adaptive pyramid context network for semantic segmentation J He, Z Deng, L Zhou, Y Wang, Y Qiao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 435 | 2019 |
Lstd: A low-shot transfer detector for object detection H Chen, Y Wang, G Wang, Y Qiao Proceedings of the AAAI conference on artificial intelligence 32 (1), 2018 | 404 | 2018 |
Learning attentive pairwise interaction for fine-grained classification P Zhuang, Y Wang, Y Qiao Proceedings of the AAAI conference on artificial intelligence 34 (07), 13130 …, 2020 | 397 | 2020 |
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking L Wang, B Huang, Z Zhao, Z Tong, Y He, Y Wang, Y Wang, Y Qiao CVPR2023, 2023 | 384 | 2023 |
Internvideo: General video foundation models via generative and discriminative learning Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ... arXiv preprint arXiv:2212.03191, 2022 | 336 | 2022 |
Mvbench: A comprehensive multi-modal video understanding benchmark K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ... CVPR2024, 2024 | 269 | 2024 |
Recurrent spatial-temporal attention network for action recognition in videos W Du, Y Wang, Y Qiao IEEE Transactions on Image Processing 27 (3), 1347-1360, 2017 | 232 | 2017 |
Internvid: A large-scale video-text dataset for multimodal understanding and generation Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ... ICLR 2024, 2024 | 229 | 2024 |
Rpan: An end-to-end recurrent pose-attention network for action recognition in videos W Du, Y Wang, Y Qiao Proceedings of the IEEE international conference on computer vision, 3725-3734, 2017 | 224 | 2017 |
Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao ICCV2023, 2023 | 180* | 2023 |
Videomamba: State space model for efficient video understanding K Li, X Li, Y Wang, Y He, Y Wang, L Wang, Y Qiao ECCV 2024, 2024 | 153 | 2024 |
Unmasked teacher: Towards training-efficient video foundation models K Li, Y Wang, Y Li, Y Wang, Y He, L Wang, Y Qiao ICCV2023, 2023 | 153 | 2023 |
Internvideo2: Scaling video foundation models for multimodal video understanding Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei, R Zheng, J Xu, Z Wang, ... ECCV 2024, 2024 | 128* | 2024 |
Smallbignet: Integrating core and contextual views for video classification X Li, Y Wang, Z Zhou, Y Qiao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 122 | 2020 |
Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition W Zhang, Y Wang, Y Qiao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 117 | 2019 |
PA3D: Pose-action 3D machine for video recognition A Yan, Y Wang, Z Li, Y Qiao Proceedings of the ieee/cvf conference on computer vision and pattern …, 2019 | 109 | 2019 |
Starting from non-parametric networks for 3d point cloud analysis R Zhang, L Wang, Y Wang, P Gao, H Li, J Shi Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 105* | 2023 |
Mining Inter-Video Proposal Relations for Video Object Detection M Han, Y Wang, X Chang, Y Qiao European Conference on Computer Vision (ECCV), 2020 | 105 | 2020 |