Sledovať
Haotian Zhang
Haotian Zhang
Research Scientist, Apple
Overená e-mailová adresa na: apple.com - Domovská stránka
Názov
Citované v
Citované v
Rok
Grounded language-image pre-training
LH Li*, P Zhang*, H Zhang*, J Yang, C Li, Y Zhong, L Wang, L Yuan, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
11802022
Glipv2: Unifying localization and vision-language understanding
H Zhang*, P Zhang*, X Hu, YC Chen, LH Li, X Dai, L Wang, L Yuan, ...
NeurIPS, 2022
3082022
Ferret: Refer and ground anything anywhere at any granularity
H You*, H Zhang*, Z Gan, X Du, B Zhang, Z Wang, L Cao, SF Chang, ...
ICLR, 2023
2562023
Simple applications of BERT for ad hoc document retrieval
W Yang, H Zhang, J Lin
arXiv preprint arXiv:1903.10972, 2019
2422019
Exploit the connectivity: Multi-object tracking with trackletnet
G Wang, Y Wang, H Zhang, R Gu, JN Hwang
Proceedings of the 27th ACM international conference on multimedia, 482-490, 2019
2392019
Transmvsnet: Global context-aware multi-view stereo network with transformers
Y Ding, W Yuan, Q Zhu, H Zhang, X Liu, Y Wang, X Liu
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022
2152022
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
B McKinzie*, Z Gan*, JP Fauconnier, S Dodge, B Zhang, P Dufter, D Shah, ...
ECCV, 2024
2082024
An internal learning approach to video inpainting
H Zhang, L Mai, N Xu, Z Wang, J Collomosse, H Jin
Proceedings of the IEEE/CVF international conference on computer vision …, 2019
972019
Eye in the sky: Drone-based object tracking and 3d localization
H Zhang, G Wang, Z Lei, JN Hwang
Proceedings of the 27th ACM international conference on multimedia, 899-907, 2019
922019
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
K You, H Zhang, E Schoop, F Weers, A Swearngin, J Nichols, Y Yang, ...
ECCV, 2024
802024
Visdrone-mot2019: The vision meets drone multiple object tracking challenge results
L Wen, P Zhu, D Du, X Bian, H Ling, Q Hu, J Zheng, T Peng, X Wang, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019
632019
VisDrone-SOT2019: The vision meets drone single object tracking challenge results
D Du, P Zhu, L Wen, X Bian, H Ling, Q Hu, J Zheng, T Peng, X Wang, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019
612019
Apple intelligence foundation language models
T Gunter, Z Wang, C Wang, R Pang, A Narayanan, A Zhang, B Zhang, ...
arXiv preprint arXiv:2407.21075, 2024
402024
Ferret-v2: An improved baseline for referring and grounding with large language models
H Zhang
COLM, 2024
29*2024
How easy is it to fool your multimodal llms? an empirical analysis on deceptive prompts
Y Qian, H Zhang, Y Yang, Z Gan
arXiv preprint arXiv:2402.13220 2 (7), 2024
282024
From scarcity to efficiency: Improving clip training via visual-enriched captions
Z Lai*, H Zhang*, W Wu, H Bai, A Timofeev, X Du, Z Gan, J Shan, ...
ECCV2024, 2023
272023
Bundle adjustment for monocular visual odometry based on detections of traffic signs
Y Zhang, H Zhang, G Wang, J Yang, JN Hwang
IEEE transactions on vehicular technology 69 (1), 151-162, 2019
212019
From scarcity to efficiency: Improving clip training via visual-enriched captions
Z Lai*, H Zhang*, B Zhang, W Wu, H Bai, A Timofeev, X Du, Z Gan, J Shan, ...
European Conference on Computer Vision, 111-127, 2025
20*2025
Empowering unsupervised domain adaptation with large-scale pre-trained vision-language models
Z Lai, H Bai, H Zhang, X Du, J Shan, Y Yang, CN Chuah, M Cao
Proceedings of the ieee/cvf winter conference on applications of computer …, 2024
182024
MM1. 5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
H Zhang*, M Gao*, Z Gan*, P Dufter, N Wenzel, F Huang, D Shah, X Du, ...
ICLR2025, 2024
172024
Systém momentálne nemôže vykonať operáciu. Skúste to neskôr.
Články 1–20