Segui
Hao Tian
Hao Tian
Sensetime
Email verificata su senseauto.com
Titolo
Citata da
Citata da
Anno
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui, W Tong, K Hu, J Luo, Z Ma, ...
Science China Information Sciences 67 (12), 220101, 2024
3482024
Bevformer v2: Adapting modern image backbones to bird's-eye-view recognition via perspective supervision
C Yang, Y Chen, H Tian, C Tao, X Zhu, Z Zhang, G Huang, H Li, Y Qiao, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
2632023
Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory
X Zhu, Y Chen, H Tian, C Tao, W Su, C Yang, G Huang, B Li, L Lu, ...
arXiv preprint arXiv:2305.17144, 2023
226*2023
Delving into the devils of bird's-eye-view perception: A review, evaluation and recipe
H Li, C Sima, J Dai, W Wang, L Lu, H Wang, J Zeng, Z Li, J Yang, H Deng, ...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
1492023
Drivemlm: Aligning multi-modal large language models with behavioral planning states for autonomous driving
W Wang, J Xie, CY Hu, H Zou, J Fan, W Tong, Y Wen, S Wu, H Deng, Z Li, ...
arXiv preprint arXiv:2312.09245, 2023
952023
Unsupervised object detection with lidar clues
H Tian, Y Chen, J Dai, Z Zhang, X Zhu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
342021
Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling
Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui, J Zhu, S Ye, H Tian, Z Liu, ...
arXiv preprint arXiv:2412.05271, 2024
222024
Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity
Y Liu, Y Cao, Z Gao, W Wang, Z Chen, W Wang, H Tian, L Lu, X Zhu, T Lu, ...
Science China Information Sciences 67 (12), 1-16, 2024
152024
Mmiu: Multimodal multi-image understanding for evaluating large vision-language models
F Meng, J Wang, C Li, Q Lu, H Tian, J Liao, X Zhu, J Dai, Y Qiao, P Luo, ...
arXiv preprint arXiv:2408.02718, 2024
102024
OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Q Li, Z Chen, W Wang, W Wang, S Ye, Z Jin, G Chen, Y He, Z Gao, E Cui, ...
arXiv preprint arXiv:2406.08418, 2024
102024
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation
R Fang, S Yan, Z Huang, J Zhou, H Tian, J Dai, H Li
arXiv preprint arXiv:2311.18835, 2023
92023
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance
Z Gao, Z Chen, E Cui, Y Ren, W Wang, J Zhu, H Tian, S Ye, J He, X Zhu, ...
Visual Intelligence 2 (1), 1-17, 2024
82024
Puma: Empowering unified mllm with multi-granular visual generation
R Fang, C Duan, K Wang, H Li, H Tian, X Zeng, R Zhao, J Dai, H Li, X Liu
arXiv preprint arXiv:2410.13861, 2024
42024
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
C Yang, X Dong, X Zhu, W Su, J Wang, H Tian, Z Chen, W Wang, L Lu, ...
arXiv preprint arXiv:2412.09613, 2024
2024
Il sistema al momento non può eseguire l'operazione. Riprova più tardi.
Articoli 1–14