Dense regression network for video grounding R Zeng, H Xu, W Huang, P Chen, M Tan, C Gan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 320 | 2020 |
3d-llm: Injecting the 3d world into large language models Y Hong, H Zhen, P Chen, S Zheng, Y Du, Z Chen, C Gan Advances in Neural Information Processing Systems 36, 20482-20494, 2023 | 249 | 2023 |
Location-aware graph convolutional networks for video question answering D Huang, P Chen, R Zeng, Q Du, M Tan, C Gan Proceedings of the AAAI Conference on Artificial Intelligence 34 (07), 11021 …, 2020 | 211 | 2020 |
Self-supervised moving vehicle tracking with stereo sound C Gan, H Zhao, P Chen, D Cox, A Torralba Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 173 | 2019 |
Foley music: Learning to generate music from videos C Gan, D Huang, P Chen, JB Tenenbaum, A Torralba Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 154 | 2020 |
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning P Chen, D Huang, D He, X Long, R Zeng, S Wen, M Tan, C Gan AAAI Conference on Artificial Intelligence, 2021, 2020 | 130 | 2020 |
Generating visually aligned sound from videos P Chen, Y Zhang, M Tan, H Xiao, D Huang, C Gan IEEE Transactions on Image Processing 29, 8292-8302, 2020 | 100 | 2020 |
Breaking winner-takes-all: Iterative-winners-out networks for weakly supervised temporal action localization R Zeng, C Gan, P Chen, W Huang, Q Wu, M Tan IEEE Transactions on Image Processing 28 (12), 5797-5808, 2019 | 95 | 2019 |
Relation attention for temporal action localization P Chen, C Gan, G Shen, W Huang, R Zeng, M Tan IEEE Transactions on Multimedia 22 (10), 2723-2733, 2019 | 84 | 2019 |
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation P Chen, D Ji, K Lin, R Zeng, TH Li, M Tan, C Gan NeurIPS 2022, 2022 | 56 | 2022 |
3d-vla: A 3d vision-language-action generative world model H Zhen, X Qiu, P Chen, J Yang, X Yan, Y Du, Y Hong, C Gan arXiv preprint arXiv:2403.09631, 2024 | 48 | 2024 |
Vesper: A compact and effective pretrained model for speech emotion recognition W Chen, X Xing, P Chen, X Xu IEEE Transactions on Affective Computing, 2024 | 37 | 2024 |
Masked motion encoding for self-supervised video representation learning X Sun, P Chen, L Chen, C Li, TH Li, M Tan, C Gan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 35 | 2023 |
Learning Active Camera for Multi-Object Navigation P Chen, D Ji, K Lin, W Hu, W Huang, TH Li, M Tan, C Gan NeurIPS 2022, 2022 | 26 | 2022 |
Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models P Chen, X Sun, H Zhi, R Zeng, TH Li, G Liu, M Tan, C Gan arXiv preprint arXiv:2308.07997, 2023 | 22 | 2023 |
Learning vision-and-language navigation from youtube videos K Lin, P Chen, D Huang, TH Li, M Tan, C Gan Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 22 | 2023 |
Multiply: A multisensory object-centric embodied large language model in 3d world Y Hong, Z Zheng, P Chen, Y Wang, J Li, C Gan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 19 | 2024 |
Covlm: Composing visual entities and relationships in large language models via communicative decoding J Li, D Chen, Y Hong, Z Chen, P Chen, Y Shen, C Gan arXiv preprint arXiv:2311.03354, 2023 | 11 | 2023 |
FGPrompt: fine-grained goal prompting for image-goal navigation X Sun, P Chen, J Fan, J Chen, T Li, M Tan Advances in Neural Information Processing Systems 36, 2024 | 7 | 2024 |
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation Z Yang, J Liu, P Chen, A Cherian, TK Marks, J Le Roux, C Gan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 5 | 2024 |