Invariant grounding for video question answering Y Li, X Wang, J Xiao, W Ji, TS Chua Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 132 | 2022 |
Video as conditional graph hierarchy for multi-granular question answering J Xiao, A Yao, Z Liu, Y Li, W Ji, TS Chua Proceedings of the AAAI Conference on Artificial Intelligence 36 (3), 2804-2812, 2022 | 126 | 2022 |
Video question answering: Datasets, algorithms and challenges Y Zhong, J Xiao, W Ji, Y Li, W Deng, TS Chua arXiv preprint arXiv:2203.01225, 2022 | 101 | 2022 |
Interventional video relation detection Y Li, X Yang, X Shang, TS Chua Proceedings of the 29th ACM International Conference on Multimedia, 4091-4099, 2021 | 65 | 2021 |
Can i trust your answer? visually grounded video question answering J Xiao, A Yao, Y Li, TS Chua Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 44 | 2024 |
Video visual relation detection via iterative inference X Shang, Y Li, J Xiao, W Ji, TS Chua Proceedings of the 29th ACM international conference on Multimedia, 3654-3663, 2021 | 39 | 2021 |
Equivariant and invariant grounding for video question answering Y Li, X Wang, J Xiao, TS Chua Proceedings of the 30th ACM International Conference on Multimedia, 4714-4722, 2022 | 35 | 2022 |
Contrastive video question answering via video graph transformer J Xiao, P Zhou, A Yao, Y Li, R Hong, S Yan, TS Chua IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023 | 34 | 2023 |
Discovering spatio-temporal rationales for video question answering Y Li, J Xiao, C Feng, X Wang, TS Chua Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 21 | 2023 |
Redundancy-aware transformer for video question answering Y Li, X Yang, A Zhang, C Feng, X Wang, TS Chua Proceedings of the 31st ACM International Conference on Multimedia, 3172-3180, 2023 | 16 | 2023 |
Vidvrd 2021: The third grand challenge on video relation detection W Ji, Y Li, M Wei, X Shang, J Xiao, T Ren, TS Chua Proceedings of the 29th ACM International Conference on Multimedia, 4779-4783, 2021 | 15 | 2021 |
LASO: Language-guided Affordance Segmentation on 3D Object Y Li, N Zhao, J Xiao, C Feng, X Wang, T Chua Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 10 | 2024 |
Transformer-empowered invariant grounding for video question answering Y Li, X Wang, J Xiao, W Ji, TS Chua IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023 | 9 | 2023 |
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives T Nguyen, Y Bin, J Xiao, L Qu, Y Li, JZ Wu, CD Nguyen, SK Ng, LA Tuan arXiv preprint arXiv:2406.05615, 2024 | 6 | 2024 |
Videoqa in the era of llms: An empirical study J Xiao, N Huang, H Qin, D Li, Y Li, F Zhu, Z Tao, J Yu, L Lin, TS Chua, ... arXiv preprint arXiv:2408.04223, 2024 | 4 | 2024 |