Internlm-xcomposer: A vision-language large model for advanced text-image comprehension and composition P Zhang, XDB Wang, Y Cao, C Xu, L Ouyang, Z Zhao, S Ding, S Zhang, ... arXiv preprint arXiv:2309.15112, 2023 | 189 | 2023 |
Towards more practical adversarial attacks on graph neural networks J Ma*, S Ding*, Q Mei Advances in neural information processing systems 33, 4756-4766, 2020 | 146 | 2020 |
Motion-aware Self-supervised Video Representation Learning via Foreground-background Merging S Ding, M Li, T Yang, R Qian, H Xu, Q Chen, J Wang, H Xiong Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021 | 63* | 2021 |
Masked autoencoders are robust data augmentors H Xu, S Ding, X Zhang, H Xiong, Q Tian arXiv preprint arXiv:2206.04846, 2022 | 32 | 2022 |
Enhancing self-supervised video representation learning via multi-level feature optimization R Qian, Y Li, H Liu, J See, S Ding, X Liu, D Li, W Lin Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 32 | 2021 |
Static and dynamic concepts for self-supervised video representation learning R Qian, S Ding, X Liu, D Lin European Conference on Computer Vision, 145-164, 2022 | 28 | 2022 |
Songcomposer: A large language model for lyric and melody composition in song generation S Ding, Z Liu, X Dong, P Zhang, R Qian, C He, D Lin, J Wang arXiv preprint arXiv:2402.17645, 2024 | 25 | 2024 |
Dual contrastive learning for spatio-temporal representation S Ding, R Qian, H Xiong Proceedings of the 30th ACM international conference on multimedia, 5649-5658, 2022 | 22 | 2022 |
Prune spatio-temporal tokens by semantic-aware temporal accumulation S Ding, P Zhao, X Zhang, R Qian, H Xiong, Q Tian Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 19 | 2023 |
Streaming long video understanding with large language models R Qian, X Dong, P Zhang, Y Zang, S Ding, D Lin, J Wang Advances in Neural Information Processing Systems 37, 119336-119360, 2025 | 17 | 2025 |
Semantics meets temporal correspondence: Self-supervised object-centric learning in videos R Qian, S Ding, X Liu, D Lin Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 17 | 2023 |
Motion-inductive self-supervised object discovery in videos S Ding, W Xie, Y Chen, R Qian, X Zhang, H Xiong, Q Tian arXiv preprint arXiv:2210.00221, 2022 | 17 | 2022 |
Betrayed by attention: A simple yet effective approach for self-supervised video object segmentation S Ding*, R Qian*, H Xu, D Lin, H Xiong European Conference on Computer Vision, 215-233, 2025 | 5 | 2025 |
Internlm-xcomposer2. 5-omnilive: A comprehensive multimodal system for long-term streaming video and audio interactions P Zhang, X Dong, Y Cao, Y Zang, R Qian, X Wei, L Chen, Y Li, J Niu, ... arXiv preprint arXiv:2412.09596, 2024 | 4 | 2024 |
Sam2long: Enhancing sam 2 for long video segmentation with a training-free memory tree S Ding, R Qian, X Dong, P Zhang, Y Zang, Y Cao, Y Guo, D Lin, J Wang arXiv preprint arXiv:2410.16268, 2024 | 3 | 2024 |
Image compression for machine and human vision with spatial-frequency adaptation H Li, S Li, S Ding, W Dai, M Cao, C Li, J Zou, H Xiong European Conference on Computer Vision, 382-399, 2024 | 3 | 2024 |
AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training L Ding, W Fei, Y Huang, S Ding, W Dai, C Li, J Zou, H Xiong Forty-first International Conference on Machine Learning, 2024 | 2 | 2024 |
Rethinking image-to-video adaptation: An object-centric perspective R Qian, S Ding, D Lin European Conference on Computer Vision, 329-348, 2024 | 1 | 2024 |
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? Y Li, J Niu, Z Miao, C Ge, Y Zhou, Q He, X Dong, H Duan, S Ding, R Qian, ... arXiv preprint arXiv:2501.05510, 2025 | | 2025 |
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction R Qian, S Ding, X Dong, P Zhang, Y Zang, Y Cao, D Lin, J Wang arXiv preprint arXiv:2501.03218, 2025 | | 2025 |