Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks Z Chen, J Wu, W Wang, W Su, G Chen, S Xing, Z Muyan, Q Zhang, X Zhu, ... IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 | 612* | 2024 |
Visionllm: Large language model is also an open-ended decoder for vision-centric tasks W Wang, Z Chen, X Chen, J Wu, X Zhu, G Zeng, P Luo, T Lu, J Zhou, ... Advances in Neural Information Processing Systems (NeurIPS), 2023 | 446 | 2023 |
Language as queries for referring video object segmentation J Wu, Y Jiang, P Sun, Z Yuan, P Luo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 168 | 2022 |
Universal instance perception as object discovery and retrieval B Yan, Y Jiang, J Wu, D Wang, P Luo, Z Yuan, H Lu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 167 | 2023 |
Watch only once: An end-to-end video action detection framework S Chen, P Sun, E Xie, C Ge, J Wu, L Ma, J Shen, P Luo Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 74 | 2021 |
Groma: Localized visual tokenization for grounding multimodal large language models C Ma, Y Jiang, J Wu, Z Yuan, X Qi European Conference on Computer Vision, 417-435, 2024 | 31 | 2024 |
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks J Wu, M Zhong, S Xing, Z Lai, Z Liu, W Wang, Z Chen, X Zhu, L Lu, T Lu, ... arXiv preprint arXiv:2406.08394, 2024 | 30 | 2024 |
Self-supervised video representation learning with motion-aware masked autoencoders H Yang, D Huang, B Wen, J Wu, H Yao, Y Jiang, X Zhu, Z Yuan arXiv preprint arXiv:2210.04154, 2022 | 19 | 2022 |
The first visual object tracking segmentation vots2023 challenge results M Kristan, J Matas, M Danelljan, M Felsberg, HJ Chang, LČ Zajc, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 13 | 2023 |
Segment every reference object in spatial and temporal spaces J Wu, Y Jiang, B Yan, H Lu, Z Yuan, P Luo Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 13 | 2023 |
Development of an effective model for computing rightmost eigenvalues of power systems with inclusion of time delays C Li, J Wu, C Duan, Z Du IEEE Transactions on Power Systems 34 (6), 4216-4227, 2019 | 13 | 2019 |
Towards high-quality temporal action detection with sparse proposals J Wu, P Sun, S Chen, J Yang, Z Qi, L Ma, P Luo arXiv preprint arXiv:2109.08847, 2021 | 11 | 2021 |
Exploring transformers for open-world instance segmentation J Wu, Y Jiang, B Yan, H Lu, Z Yuan, P Luo Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 7 | 2023 |
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces J Wu, Y Jiang, B Yan, H Lu, Z Yuan, P Luo arXiv preprint arXiv:2312.15715, 2023 | 5 | 2023 |
A Simple Baseline for Open-World Tracking via Self-training B Wang, T Li, J Wu, Y Jiang, H Lu, Y He Proceedings of the 31st ACM International Conference on Multimedia, 2765-2774, 2023 | 2 | 2023 |
Multi-Level Contrastive Learning for Dense Prediction Task Q Guo, Y Yu, Y Jiang, J Wu, Z Yuan, P Luo arXiv preprint arXiv:2304.02010, 2023 | 2 | 2023 |
Method, apparatus, device, and medium for processing visual task by generic model Y Jiang, B Yan, J Wu, Y Zehuan US Patent App. 18/531,091, 2024 | | 2024 |
Method, apparatus, device and medium for processing image using machine learning model Y Jiang, J Wu, B Yan, Y Zehuan US Patent App. 18/499,066, 2024 | | 2024 |
MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Auto encoders H Yang, D Huang, B Wen, J Wu, H Yao, Y Jiang, X Zhu, Z Yuan | | 2024 |