How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui, W Tong, K Hu, J Luo, Z Ma, ... Science China Information Sciences 67 (12), 220101, 2024 | 348 | 2024 |
Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning R Xia, B Zhang, H Ye, X Yan, Q Liu, H Zhou, Z Chen, M Dou, B Shi, J Yan, ... arXiv preprint arXiv:2402.12185, 2024 | 37 | 2024 |
Bi3d: Bi-domain active learning for cross-domain 3d object detection J Yuan, B Zhang, X Yan, T Chen, B Shi, Y Li, Y Qiao Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 33 | 2023 |
Ad-pt: Autonomous driving pre-training with large-scale point cloud dataset J Yuan, B Zhang, X Yan, B Shi, T Chen, Y Li, Y Qiao Advances in Neural Information Processing Systems 36, 2024 | 29 | 2024 |
Structchart: Perception, structuring, reasoning for visual chart understanding R Xia, B Zhang, H Peng, H Ye, X Yan, P Ye, B Shi, Y Qiao, J Yan arXiv preprint arXiv:2309.11268, 2023 | 21 | 2023 |
OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Q Li, Z Chen, W Wang, W Wang, S Ye, Z Jin, G Chen, Y He, Z Gao, E Cui, ... arXiv preprint arXiv:2406.08418, 2024 | 10 | 2024 |
Docgenome: An open large-scale scientific document benchmark for training and testing multi-modal large language models R Xia, S Mao, X Yan, H Zhou, B Zhang, H Peng arXiv preprint arXiv:2406.11633 2, 2024 | 8* | 2024 |
Spot: Scalable 3d pre-training via occupancy prediction for autonomous driving X Yan, R Chen, B Zhang, J Yuan, X Cai, B Shi, W Shao, J Yan, P Luo, ... arXiv preprint arXiv:2309.10527, 2023 | 7 | 2023 |
Resimad: Zero-shot 3d domain transfer for autonomous driving with source reconstruction and target simulation B Zhang, X Cai, J Yuan, D Yang, J Guo, X Yan, R Xia, B Shi, M Dou, ... arXiv preprint arXiv:2309.05527, 2023 | 6 | 2023 |
Cooperative light-field image super-resolution based on multi-modality embedding and fusion with frequency attention H Yao, J Ren, X Yan, M Ren IEEE Signal Processing Letters 29, 548-552, 2021 | 4 | 2021 |
Self-learning based highly efficient sampling strategy for complex surface reconstruction on contact measurements J Ren, X Yan, L Sun, M Ren 2021 3rd International Conference on Industrial Artificial Intelligence (IAI …, 2021 | 2 | 2021 |
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training R Xia, M Li, H Ye, W Wu, H Zhou, J Yuan, T Peng, X Cai, X Yan, B Wang, ... arXiv preprint arXiv:2412.11863, 2024 | 1 | 2024 |
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback J Yuan, X Yan, B Shi, T Chen, W Ouyang, B Zhang, L Bai, Y Qiao, B Zhou arXiv preprint arXiv:2501.03916, 2025 | | 2025 |
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy H Ye, J Yuan, R Xia, X Yan, T Chen, J Yan, B Shi, B Zhang arXiv preprint arXiv:2410.09873, 2024 | | 2024 |
Self-supervision based super-resolution approach for light field refocused image X Yan, J Ren, H Yao, M Ren Fourteenth International Conference on Digital Image Processing (ICDIP 2022 …, 2022 | | 2022 |