Twins: Revisiting the design of spatial attention in vision transformers X Chu, Z Tian, Y Wang, B Zhang, H Ren, X Wei, H Xia, C Shen Advances in Neural Information Processing Systems 34, 2021 | 1158 | 2021 |
End-to-End Video Instance Segmentation with Transformers Y Wang, Z Xu, X Wang, C Shen, B Cheng, H Shen, H Xia IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2021 | 833 | 2021 |
Centermask: single shot instance segmentation with point representation Y Wang, Z Xu, H Shen, B Cheng, L Yang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 107 | 2020 |
Loong: Generating minute-level long videos with autoregressive language models Y Wang, T Xiong, D Zhou, Z Lin, Y Zhao, B Kang, J Feng, X Liu arXiv preprint arXiv:2410.02757, 2024 | 20 | 2024 |
Discovering sounding objects by audio queries for audio visual segmentation S Huang, H Li, Y Wang, H Zhu, J Dai, J Han, W Rong, S Liu arXiv preprint arXiv:2309.09501, 2023 | 12 | 2023 |
Lvd-2m: A long-take video dataset with temporally dense captions T Xiong, Y Wang, D Zhou, Z Lin, J Feng, X Liu arXiv preprint arXiv:2410.10816, 2024 | 3 | 2024 |
Parallelized Autoregressive Visual Generation Y Wang, S Ren, Z Lin, Y Han, H Guo, Z Yang, D Zou, J Feng, X Liu arXiv preprint arXiv:2412.15119, 2024 | | 2024 |