Bevdepth: Acquisition of reliable depth for multi-view 3d object detection Y Li, Z Ge, G Yu, J Yang, Z Wang, Y Shi, J Sun, Z Li Proceedings of the AAAI Conference on Artificial Intelligence 37 (2), 1477-1485, 2023 | 625 | 2023 |
Bevstereo: Enhancing depth estimation in multi-view 3d object detection with temporal stereo Y Li, H Bao, Z Ge, J Yang, J Sun, Z Li Proceedings of the AAAI Conference on Artificial Intelligence 37 (2), 1486-1494, 2023 | 223 | 2023 |
Dreamllm: Synergistic multimodal comprehension and creation R Dong, C Han, Y Peng, Z Qi, Z Ge, J Yang, L Zhao, J Sun, H Zhou, H Wei, ... arXiv preprint arXiv:2309.11499, 2023 | 150 | 2023 |
Cross modal transformer: Towards fast and robust 3d object detection J Yan, Y Liu, J Sun, F Jia, S Li, T Wang, X Zhang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 132* | 2023 |
Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? R Dong, Z Qi, L Zhang, J Zhang, J Sun, Z Ge, L Yi, K Ma arXiv preprint arXiv:2212.08320, 2022 | 97 | 2022 |
Reversible column networks Y Cai, Y Zhou, Q Han, J Sun, X Kong, J Li, X Zhang arXiv preprint arXiv:2212.11696, 2022 | 84 | 2022 |
Vary: Scaling up the vision vocabulary for large vision-language model H Wei, L Kong, J Chen, L Zhao, Z Ge, J Yang, J Sun, C Han, X Zhang European Conference on Computer Vision, 408-424, 2024 | 75 | 2024 |
Exploring recurrent long-term temporal fusion for multi-view 3d perception C Han, J Yang, J Sun, Z Ge, R Dong, H Zhou, W Mao, Y Peng, X Zhang IEEE Robotics and Automation Letters, 2024 | 58 | 2024 |
Chatspot: Bootstrapping multimodal llms via precise referring instruction tuning L Zhao, E Yu, Z Ge, J Yang, H Wei, H Zhou, J Sun, Y Peng, R Dong, ... arXiv preprint arXiv:2307.09474, 2023 | 50 | 2023 |
Small language model meets with reinforced vision vocabulary H Wei, L Kong, J Chen, L Zhao, Z Ge, E Yu, J Sun, C Han, X Zhang arXiv preprint arXiv:2401.12503, 2024 | 31 | 2024 |
General ocr theory: Towards ocr-2.0 via a unified end-to-end model H Wei, C Liu, J Chen, J Wang, L Kong, Y Xu, Z Ge, L Zhao, J Sun, Y Peng, ... | 20 | 2024 |
Focus Anywhere for Fine-grained Multi-page Document Understanding C Liu, H Wei, J Chen, L Kong, Z Ge, Z Zhu, L Zhao, J Sun, C Han, ... arXiv preprint arXiv:2405.14295, 2024 | 15 | 2024 |
Onechart: Purify the chart structural extraction via one auxiliary token J Chen, L Kong, H Wei, C Liu, Z Ge, L Zhao, J Sun, C Han, X Zhang Proceedings of the 32nd ACM International Conference on Multimedia, 147-155, 2024 | 13 | 2024 |
The 1st-place solution for cvpr 2023 openlane topology in autonomous driving challenge D Wu, F Jia, J Chang, Z Li, J Sun, C Han, S Li, Y Liu, Z Ge, T Wang arXiv preprint arXiv:2306.09590, 2023 | 12 | 2023 |
Disttrain: Addressing model and data heterogeneity with disaggregated training for multimodal large language models Z Zhang, Y Zhong, R Ming, H Hu, J Sun, Z Ge, Y Zhu, X Jin arXiv preprint arXiv:2408.04275, 2024 | 4 | 2024 |
Bevstereo++: Accurate depth estimation in multi-view 3d object detection via dynamic temporal stereo Y Li, J Yang, J Sun, H Bao, Z Ge, L Xiao arXiv preprint arXiv:2304.04185, 2023 | 4 | 2023 |
Slow Perception: Let's Perceive Geometric Figures Step-by-step H Wei, Y Yin, Y Li, J Wang, L Zhao, J Sun, Z Ge, X Zhang arXiv preprint arXiv:2412.20631, 2024 | | 2024 |
First Place Solution to the 3D Object Detection of the SSLAD2022 Challenge T Huang, Z Yao, L Liu, B Wang, T Jiang, J Sun, X Wang, Z Li, H Yao | | |