Prati
Xiaotian Han
Xiaotian Han
TikTok
Potvrđena adresa e-pošte na bytedance.com - Početna stranica
Naslov
Citirano
Citirano
Godina
Exploring the reasoning abilities of multimodal large language models (mllms): A comprehensive survey on emerging trends in multimodal reasoning
Y Wang, W Chen, X Han, X Lin, H Zhao, Y Liu, B Zhai, J Yuan, Q You, ...
arXiv preprint arXiv:2401.06805, 2024
72*2024
Real-time micro-scale temperature imaging at low cost based on fluorescent intensity ratio
J Xiong, M Zhao, X Han, Z Cao, X Wei, Y Chen, C Duan, M Yin
Scientific Reports 7 (1), 41311, 2017
372017
Mmptrack: Large-scale densely annotated multi-camera multiple people tracking benchmark
X Han, Q You, C Wang, Z Zhang, P Chu, H Hu, J Wang, Z Liu
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2023
36*2023
Image scene graph generation (sgg) benchmark
X Han, J Yang, H Hu, L Zhang, J Gao, P Zhang
arXiv preprint arXiv:2107.12604, 2021
342021
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
X Han, Q You, Y Liu, W Chen, H Zheng, K Mrini, X Lin, Y Wang, B Zhai, ...
arXiv e-prints, arXiv: 2311.11567, 2023
14*2023
Vitar: Vision transformer with any resolution
Q Fan, Q You, X Han, Y Liu, Y Tao, H Huang, R He, H Yang
arXiv preprint arXiv:2403.18361, 2024
112024
Infimm-webmath-40b: Advancing multimodal pre-training for enhanced mathematical reasoning
X Han, Y Jian, X Hu, H Liu, Y Wang, Q Fan, Y Ai, H Huang, R He, Z Yang, ...
arXiv preprint arXiv:2409.12568, 2024
72024
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model
H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y Tao, ...
Findings of the Association for Computational Linguistics ACL 2024, 485-492, 2024
7*2024
Infimm-hd: A leap forward in high-resolution multimodal understanding
H Liu, Q You, X Han, Y Wang, B Zhai, Y Liu, Y Tao, H Huang, R He, ...
arXiv preprint arXiv:2403.01487, 2024
7*2024
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Y Liu, P Li, Z Wei, C Xie, X Hu, X Xu, S Zhang, X Han, H Yang, F Wu
arXiv preprint arXiv:2501.04575, 2025
32025
Quanzeng You, and Hongxia Yang. Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation
Y Ai, X Zhou, H Huang, X Han, Z Chen
NeurIPS 5 (6), 7, 2024
32024
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
H Liu, Q You, X Han, Y Liu, H Huang, R He, H Yang
Advances in Neural Information Processing Systems 37, 17696-17718, 2025
22025
COCO is “ALL” You Need for Visual Instruction Fine-tuning
X Han, Y Wang, B Zhai, Q You, H Yang
2024 IEEE International Conference on Multimedia and Expo (ICME), 1-5, 2024
12024
InfiR: Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
C Xie, S Cai, W Wang, P Li, Z Sang, K Yang, Y Zhang, Z Li, G Zhu, Z Liu, ...
arXiv preprint arXiv:2502.11573, 2025
2025
BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
X Wang, Q Cui, Y Tao, Y Wang, Z Chai, X Han, B Liu, J Yuan, J Su, ...
arXiv preprint arXiv:2410.00773, 2024
2024
Sustav trenutno ne može provesti ovu radnju. Pokušajte ponovo kasnije.
Članci 1–15