Invariant grounding for video question answering Y Li, X Wang, J Xiao, W Ji, TS Chua Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 134 | 2022 |
Video as conditional graph hierarchy for multi-granular question answering J Xiao, A Yao, Z Liu, Y Li, W Ji, TS Chua Proceedings of the AAAI Conference on Artificial Intelligence 36 (3), 2804-2812, 2022 | 130 | 2022 |
Video question answering: Datasets, algorithms and challenges Y Zhong, J Xiao, W Ji, Y Li, W Deng, TS Chua arXiv preprint arXiv:2203.01225, 2022 | 99 | 2022 |
Interventional video relation detection Y Li, X Yang, X Shang, TS Chua Proceedings of the 29th ACM International Conference on Multimedia, 4091-4099, 2021 | 65 | 2021 |
Can i trust your answer? visually grounded video question answering J Xiao, A Yao, Y Li, TS Chua Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 46 | 2024 |
Video visual relation detection via iterative inference X Shang, Y Li, J Xiao, W Ji, TS Chua Proceedings of the 29th ACM international conference on Multimedia, 3654-3663, 2021 | 39 | 2021 |
Contrastive video question answering via video graph transformer J Xiao, P Zhou, A Yao, Y Li, R Hong, S Yan, TS Chua IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (11 …, 2023 | 38 | 2023 |
Equivariant and invariant grounding for video question answering Y Li, X Wang, J Xiao, TS Chua Proceedings of the 30th ACM International Conference on Multimedia, 4714-4722, 2022 | 36 | 2022 |
Discovering spatio-temporal rationales for video question answering Y Li, J Xiao, C Feng, X Wang, TS Chua Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 23 | 2023 |
Redundancy-aware transformer for video question answering Y Li, X Yang, A Zhang, C Feng, X Wang, TS Chua Proceedings of the 31st ACM International Conference on Multimedia, 3172-3180, 2023 | 17 | 2023 |
Vidvrd 2021: The third grand challenge on video relation detection W Ji, Y Li, M Wei, X Shang, J Xiao, T Ren, TS Chua Proceedings of the 29th ACM International Conference on Multimedia, 4779-4783, 2021 | 15 | 2021 |
Laso: Language-guided affordance segmentation on 3d object Y Li, N Zhao, J Xiao, C Feng, X Wang, T Chua Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 11 | 2024 |
Transformer-empowered invariant grounding for video question answering Y Li, X Wang, J Xiao, W Ji, TS Chua IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023 | 10 | 2023 |
Video-language understanding: A survey from model architecture, model training, and data perspectives T Nguyen, Y Bin, J Xiao, L Qu, Y Li, JZ Wu, CD Nguyen, SK Ng, LA Tuan arXiv preprint arXiv:2406.05615, 2024 | 7 | 2024 |
Videoqa in the era of llms: An empirical study J Xiao, N Huang, H Qin, D Li, Y Li, F Zhu, Z Tao, J Yu, L Lin, TS Chua, ... International Journal of Computer Vision, 1-24, 2025 | 6 | 2025 |
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering S Zhou, J Xiao, Q Li, Y Li, X Yang, D Guo, M Wang, TS Chua, A Yao arXiv preprint arXiv:2502.07411, 2025 | | 2025 |