Anwen Hu

Geciteerd door

	Alles	Sinds 2020
Citaties	1916	1915
h-index	15	15
i10-index	18	18

1500

750

375

1125

2020202120222023202420256 29 67 265 1412 135

Openbare toegang

Alles bekijken

11 artikelen

1 artikel

beschikbaar

niet beschikbaar

Op basis van financieringsmachtigingen

Medeauteurs

Haiyang XuAlibaba Group, DIDI AI LABS, SEUGeverifieerd e-mailadres voor seu.edu.cn
Qin Jin中国人民大学信息学院Geverifieerd e-mailadres voor ruc.edu.cn
Qinghao YeByteDance Ltd.; University of California, San DiegoGeverifieerd e-mailadres voor ucsd.edu
Guohai XuDAMO Academy, Alibaba GroupGeverifieerd e-mailadres voor alibaba-inc.com
Shizhe ChenINRIA ParisGeverifieerd e-mailadres voor inria.fr
Zhicheng DouRenmin University of ChinaGeverifieerd e-mailadres voor ruc.edu.cn
Ji-Rong WenRenmin University of ChinaGeverifieerd e-mailadres voor ruc.edu.cn
Jian-Yun Nieuniversity of montrealGeverifieerd e-mailadres voor iro.umontreal.ca

Volgen

Anwen Hu

Alibaba Group

Geverifieerd e-mailadres voor ruc.edu.cn

Multimodal Pretraining Image Captioning


Titel Sorteren op citaties Sorteren op jaar Sorteren op titel	Geciteerd door Geciteerd door	Jaar
mplug-owl: Modularization empowers large language models with multimodality Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ... arXiv preprint arXiv:2304.14178, 2023	827	2023
mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu, Q Qian, J Zhang, F Huang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024	331	2024
WenLan: Bridging vision and language by large-scale multi-modal pre-training Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ... arXiv preprint arXiv:2103.06561, 2021	145	2021
Ureader: Universal ocr-free visually-situated language understanding with multimodal large language model J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ... arXiv preprint arXiv:2310.05126, 2023	113	2023
mplug-docowl: Modularized multimodal large language model for document understanding J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ... arXiv preprint arXiv:2307.02499, 2023	109	2023
mplug-docowl 1.5: Unified structure learning for ocr-free document understanding A Hu, H Xu, J Ye, M Yan, L Zhang, B Zhang, C Li, J Zhang, Q Jin, F Huang, ... arXiv preprint arXiv:2403.12895, 2024	79	2024
mplug-owl3: Towards long image-sequence understanding in multi-modal large language models J Ye, H Xu, H Liu, A Hu, M Yan, Q Qian, J Zhang, F Huang, J Zhou arXiv preprint arXiv:2408.04840, 2024	48	2024
mplug-paperowl: Scientific diagram analysis with the multimodal large language model A Hu, Y Shi, H Xu, J Ye, Q Ye, M Yan, C Li, Q Qian, J Zhang, F Huang Proceedings of the 32nd ACM International Conference on Multimedia, 6929-6938, 2024	30	2024
Leveraging multi-token entities in document-level named entity recognition A Hu, Z Dou, JY Nie, JR Wen Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 7961-7968, 2020	30	2020
Tinychart: Efficient chart understanding with visual token merging and program-of-thoughts learning L Zhang, A Hu, H Xu, M Yan, Y Xu, Q Jin, J Zhang, F Huang arXiv preprint arXiv:2404.16635, 2024	21	2024
Youku-mplug: A 10 million large-scale chinese video-language dataset for pre-training and benchmarks H Xu, Q Ye, X Wu, M Yan, Y Miao, J Ye, G Xu, A Hu, Y Shi, G Xu, C Li, ... arXiv preprint arXiv:2306.04362, 2023	21	2023
A roadmap for big model S Yuan, H Zhao, S Zhao, J Leng, Y Liang, X Wang, J Yu, X Lv, Z Shao, ... arXiv preprint arXiv:2203.14101, 2022	21	2022
Icecap: Information concentrated entity-aware image captioning A Hu, S Chen, Q Jin Proceedings of the 28th ACM International Conference on Multimedia, 4217-4225, 2020	21	2020
Infometic: An informative metric for reference-free image caption evaluation A Hu, S Chen, L Zhang, Q Jin arXiv preprint arXiv:2305.06002, 2023	19	2023
Question-controlled text-aware image captioning A Hu, S Chen, Q Jin Proceedings of the 29th ACM International Conference on Multimedia, 3097-3105, 2021	19	2021
Movie101: A new movie understanding benchmark Z Yue, Q Zhang, A Hu, L Zhang, Z Wang, Q Jin arXiv preprint arXiv:2305.12140, 2023	14	2023
mplug-docowl2: High-resolution compressing for ocr-free multi-page document understanding A Hu, H Xu, L Zhang, J Ye, M Yan, J Zhang, Q Jin, F Huang, J Zhou arXiv preprint arXiv:2409.03420, 2024	13	2024
Accommodating audio modality in CLIP for multimodal processing L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin Proceedings of the AAAI Conference on Artificial Intelligence 37 (8), 9641-9649, 2023	12	2023
MPMQA: multimodal question answering on product manuals L Zhang, A Hu, J Zhang, S Hu, Q Jin Proceedings of the AAAI Conference on Artificial Intelligence 37 (11), 13958 …, 2023	8	2023
Multimodal pretraining from monolingual to multilingual L Zhang, L Ruan, A Hu, Q Jin Machine Intelligence Research 20 (2), 220-232, 2023	6	2023

Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.

Artikelen 1–20

Citaties per jaar

Dubbele citaties

Samengevoegde citaties

Medeauteurs toevoegenMedeauteurs

Volgen

Geciteerd door

Medeauteurs