Подписаться
Yuanhan Zhang
Yuanhan Zhang
PhD Candidate, MMLab@NTU
Подтвержден адрес электронной почты в домене e.ntu.edu.sg - Главная страница
Название
Процитировано
Процитировано
Год
Mmbench: Is your multi-modal model an all-around player?
Y Liu*, H Duan*, Y Zhang*, B Li*, S Zhang*, W Zhao, Y Yuan, J Wang, ...
European Conference on Computer Vision, 216-233, 2025
7392025
Mimic-it: Multi-modal in-context instruction tuning
B Li*, Y Zhang*, L Chen, J Wang, F Pu, J Yang, C Li, Z Liu
arXiv preprint arXiv:2306.05425, 2023
6462023
Llava-next: Improved reasoning, ocr, and world knowledge
H Liu, C Li, Y Li, B Li, Y Zhang, S Shen, YJ Lee
3192024
Llava-onevision: Easy visual task transfer
B Li, Y Zhang, D Guo, R Zhang, F Li, H Zhang, K Zhang, P Zhang, Y Li, ...
arXiv preprint arXiv:2408.03326, 2024
2592024
Vbench: Comprehensive benchmark suite for video generative models
Z Huang, Y He, J Yu, F Zhang, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
2242024
Celeba-spoof: Large-scale face anti-spoofing dataset with rich annotations
Y Zhang, ZF Yin, Y Li, G Yin, J Yan, J Shao, Z Liu
Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020
2212020
Neural prompt search
Y Zhang, K Zhou, Z Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
1962024
Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models
F Li, R Zhang, H Zhang, Y Zhang, B Li, W Li, Z Ma, C Li
arXiv preprint arXiv:2407.07895, 2024
1092024
What makes good examples for visual in-context learning?
Y Zhang, K Zhou, Z Liu
Advances in Neural Information Processing Systems 36, 17773-17794, 2023
992023
Llava-next: A strong zero-shot video understanding model
Y Zhang, B Li, H Liu, Y Lee, L Gui, D Fu, J Feng, Z Liu, C Li
95*2024
Long context transfer from language to vision
P Zhang, K Zhang, B Li, G Zeng, J Yang, Y Zhang, Z Wang, H Tan, C Li, ...
arXiv preprint arXiv:2406.16852, 2024
692024
Octopus: Embodied vision-language programmer from environmental feedback
J Yang, Y Dong, S Liu, B Li, Z Wang, H Tan, C Jiang, J Kang, Y Zhang, ...
European Conference on Computer Vision, 20-38, 2024
472024
Otterhd: A high-resolution multi-modality model
B Li, P Zhang, J Yang, Y Zhang, F Pu, Z Liu
arXiv preprint arXiv:2311.04219, 2023
462023
Learning without forgetting for vision-language models
DW Zhou, Y Zhang, Y Wang, J Ning, HJ Ye, DC Zhan, Z Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
422025
Video instruction tuning with synthetic data
Y Zhang, J Wu, W Li, B Li, Z Ma, Z Liu, C Li
arXiv preprint arXiv:2410.02713, 2024
422024
Lmms-eval: Reality check on the evaluation of large multimodal models
K Zhang, B Li, P Zhang, F Pu, JA Cahyono, K Hu, S Liu, Y Zhang, J Yang, ...
arXiv preprint arXiv:2407.12772, 2024
36*2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
R Zhang, L Gui, Z Sun, Y Feng, K Xu, Y Zhang, D Fu, C Li, A Hauptmann, ...
arXiv preprint arXiv:2404.01258, 2024
31*2024
Benchmarking omni-vision representation through the lens of visual realms
Y Zhang, Z Yin, J Shao, Z Liu
European Conference on Computer Vision, 594-611, 2022
262022
Funqa: Towards surprising video comprehension
B Xie, S Zhang, Z Zhou, B Li, Y Zhang, J Hessel, J Yang, Z Liu
European Conference on Computer Vision, 39-57, 2024
242024
Bamboo: Building mega-scale vision dataset continually with human-machine synergy
Y Zhang, Q Sun, Y Zhou, Z He, Z Yin, K Wang, L Sheng, Y Qiao, J Shao, ...
arXiv preprint arXiv:2203.07845, 2022
192022
В данный момент система не может выполнить эту операцию. Повторите попытку позднее.
Статьи 1–20