Követés
Ming Yan
Ming Yan
Alibaba Group
Nincs ellenőrzött e-mail-cím
Cím
Hivatkozott rá
Hivatkozott rá
Év
mplug-owl: Modularization empowers large language models with multimodality
Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ...
arXiv preprint arXiv:2304.14178, 2023
8332023
mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration
Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu, Q Qian, J Zhang, F Huang
Proceedings of the ieee/cvf conference on computer vision and pattern …, 2024
3612024
X-clip: End-to-end multi-grained contrastive learning for video-text retrieval
Y Ma, G Xu, X Sun, M Yan, J Zhang, R Ji
Proceedings of the 30th ACM international conference on multimedia, 638-647, 2022
2692022
Multi-granularity hierarchical attention fusion networks for reading comprehension and question answering
W Wang, M Yan, C Wu
arXiv preprint arXiv:1811.11934, 2018
2152018
Structbert: Incorporating language structures into pre-training for deep language understanding
W Wang, B Bi, M Yan, C Wu, Z Bao, J Xia, L Peng, L Si
arXiv preprint arXiv:1908.04577, 2019
1902019
mplug: Effective and efficient vision-language learning by cross-modal skip-connections
C Li, H Xu, J Tian, W Wang, M Yan, B Bi, J Ye, H Chen, G Xu, Z Cao, ...
arXiv preprint arXiv:2205.12005, 2022
1332022
mplug-2: A modularized multi-modal foundation model across text, image and video
H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li, B Bi, Q Qian, W Wang, G Xu, ...
International Conference on Machine Learning, 38728-38748, 2023
1322023
StructuralLM: Structural pre-training for form understanding
C Li, B Bi, M Yan, W Wang, S Huang, F Huang, L Si
arXiv preprint arXiv:2105.11210, 2021
1292021
Ureader: Universal ocr-free visually-situated language understanding with multimodal large language model
J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ...
arXiv preprint arXiv:2310.05126, 2023
1192023
E2E-VLP: End-to-end vision-language pre-training enhanced by visual learning
H Xu, M Yan, C Li, B Bi, S Huang, W Xiao, F Huang
arXiv preprint arXiv:2106.01804, 2021
1162021
mplug-docowl: Modularized multimodal large language model for document understanding
J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ...
arXiv preprint arXiv:2307.02499, 2023
1142023
Evaluation and analysis of hallucination in large vision-language models
J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu, Q Ye, M Yan, J Zhang, J Zhu, ...
arXiv preprint arXiv:2308.15126, 2023
1112023
Unified youtube video recommendation via cross-network collaboration
M Yan, J Sang, C Xu
Proceedings of the 5th ACM on International Conference on Multimedia …, 2015
932015
Session-aware information embedding for e-commerce product recommendation
C Wu, M Yan
Proceedings of the 2017 ACM on conference on information and knowledge …, 2017
912017
mplug-docowl 1.5: Unified structure learning for ocr-free document understanding
A Hu, H Xu, J Ye, M Yan, L Zhang, B Zhang, C Li, J Zhang, Q Jin, F Huang, ...
arXiv preprint arXiv:2403.12895, 2024
872024
Friend transfer: Cold-start friend recommendation with cross-platform transfer learning of social knowledge
M Yan, J Sang, T Mei, C Xu
2013 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2013
852013
An llm-free multi-dimensional benchmark for mllms hallucination evaluation
J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia, M Yan, J Zhang, J Sang
arXiv preprint arXiv:2311.07397 1 (2), 8, 2023
842023
Palm: Pre-training an autoencoding&autoregressive language model for context-conditioned generation
B Bi, C Li, C Wu, M Yan, W Wang, S Huang, F Huang, L Si
arXiv preprint arXiv:2004.07159, 2020
802020
Hitea: Hierarchical temporal-aware video-language pre-training
Q Ye, G Xu, M Yan, H Xu, Q Qian, J Zhang, F Huang
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
772023
Shifting more attention to visual backbone: Query-modulated refinement networks for end-to-end visual grounding
J Ye, J Tian, M Yan, X Yang, X Wang, J Zhang, L He, X Lin
proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022
762022
A rendszer jelenleg nem tudja elvégezni a műveletet. Próbálkozzon újra később.
Cikkek 1–20