Msmdfusion: Fusing lidar and camera at multiple scales with multi-depth seeds for 3d object detection Y Jiao, Z Jie, S Chen, J Chen, L Ma, YG Jiang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023 | 106 | 2023 |
Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario T Qian, J Chen, L Zhuo, Y Jiao, YG Jiang Proceedings of the AAAI Conference on Artificial Intelligence 38 (5), 4542-4550, 2024 | 104 | 2024 |
More: Multi-order relation mining for dense captioning in 3d scenes Y Jiao, S Chen, Z Jie, J Chen, L Ma, YG Jiang European Conference on Computer Vision, 528-545, 2022 | 43 | 2022 |
Two-stage visual cues enhancement network for referring image segmentation Y Jiao, Z Jie, W Luo, J Chen, YG Jiang, X Wei, L Ma Proceedings of the 29th ACM international conference on multimedia, 1331-1340, 2021 | 26 | 2021 |
Lumen: Unleashing versatile vision-centric capabilities of large multimodal models Y Jiao, S Chen, Z Jie, J Chen, L Ma, YG Jiang NeurIPS 2024, 2024 | 12 | 2024 |
Eyes can deceive: Benchmarking counterfactual reasoning abilities of multi-modal large language models Y Li, W Tian, Y Jiao, J Chen, YG Jiang arXiv e-prints, arXiv: 2404.12966, 2024 | 12 | 2024 |
From canteen food to daily meals: Generalizing food recognition to more practical scenarios G Liu, Y Jiao, J Chen, B Zhu, YG Jiang IEEE Transactions on Multimedia, 2024 | 10 | 2024 |
Eventhallusion: Diagnosing event hallucinations in video llms J Zhang, Y Jiao, S Chen, J Chen, YG Jiang arXiv preprint arXiv:2409.16597, 2024 | 8 | 2024 |
Instance-aware multi-camera 3D object detection with structural priors mining and self-boosting learning Y Jiao, Z Jie, S Chen, L Cheng, J Chen, L Ma, YG Jiang Proceedings of the AAAI Conference on Artificial Intelligence 38 (3), 2598-2606, 2024 | 8 | 2024 |
Suspected Objects Matter: Rethinking Model's Prediction for One-stage Visual Grounding Y Jiao, Z Jie, J Chen, L Ma, YG Jiang Proceedings of the 31st ACM International Conference on Multimedia, 17-26, 2023 | 7 | 2023 |
Eagle: Towards efficient arbitrary referring visual prompts comprehension for multimodal large language models J Zhang, Y Jiao, S Chen, J Chen, YG Jiang arXiv preprint arXiv:2409.16723, 2024 | 1 | 2024 |