Video understanding with large language models: A survey Y Tang, J Bi, S Xu, L Song, S Liang, T Wang, D Zhang, J An, J Lin, R Zhu, ... arXiv preprint arXiv:2312.17432, 2023 | 63 | 2023 |
Procedure planning in instructional videos via contextual modeling and model-based policy learning J Bi, J Luo, C Xu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 49 | 2021 |
Avicuna: Audio-visual llm with interleaver and context-boundary alignment for temporal referential dialogue Y Tang, D Shimada, J Bi, C Xu arXiv e-prints, arXiv: 2403.16276, 2024 | 19 | 2024 |
Navigation by imitation in a pedestrian-rich environment J Bi, T Xiao, Q Sun, C Xu arXiv preprint arXiv:1811.00506, 2018 | 14 | 2018 |
Learning from interventions using hierarchical policies for safe learning J Bi, V Dhiman, T Xiao, C Xu Proceedings of the AAAI Conference on Artificial Intelligence 34 (06), 10352 …, 2020 | 10 | 2020 |
Oscar: Object state captioning and state change representation N Nguyen, J Bi, A Vosoughi, Y Tian, P Fazli, C Xu arXiv preprint arXiv:2402.17128, 2024 | 8 | 2024 |
MISAR: A multimodal instructional system with augmented reality J Bi, NM Nguyen, A Vosoughi, C Xu arXiv preprint arXiv:2310.11699, 2023 | 6 | 2023 |
Deep Learning for Musical Instrument Recognition M Yun, J Bi University of Rochester, 2017 | 3 | 2017 |
EAGLE: Egocentric AGgregated Language-video Engine J Bi, Y Tang, L Song, A Vosoughi, N Nguyen, C Xu arXiv preprint arXiv:2409.17523, 2024 | 2 | 2024 |
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? Y Tang, J Guo, H Hua, S Liang, M Feng, X Li, R Mao, C Huang, J Bi, ... arXiv preprint arXiv:2411.10979, 2024 | 1 | 2024 |
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding Y Tang, D Shimada, J Bi, M Feng, H Hua, C Xu arXiv preprint arXiv:2403.16276, 2024 | 1 | 2024 |
Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences J Shi, J Bi, Y Liu, C Xu arXiv preprint arXiv:2010.01381, 2020 | 1 | 2020 |
Generative AI for Cel-Animation: A Survey Y Tang, J Guo, P Liu, Z Wang, H Hua, JX Zhong, Y Xiao, C Huang, L Song, ... arXiv preprint arXiv:2501.06250, 2025 | | 2025 |
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach J Bi, J Guo, Y Tang, LB Wen, Z Liu, C Xu arXiv preprint arXiv:2412.18108, 2024 | | 2024 |