フォロー
Haiyang Xu
Haiyang Xu
Alibaba Group, DIDI AI LABS, SEU
確認したメール アドレス: seu.edu.cn - ホームページ
タイトル
引用先
引用先
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality.
Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ...
arXiv preprint arXiv:2304.14178, 2023
8422023
mPLUG-Owl2: Revolutionizing multi-modal large language model with modality collaboration
Q Ye, H Xu, J Ye, M Yan, H Liu, Q Qian, J Zhang, F Huang, J Zhou
CVPR2024 Highlight, 2023
3312023
Learning alignment for multimodal emotion recognition from speech
H Xu, H Zhang, K Han, Y Wang, Y Peng, X Li
InterSpeech2019, 2019
1812019
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
C Li, H Xu, J Tian, W Wang, M Yan, ...
EMNLP2022, 2022
160*2022
mPLUG-2: A modularized multi-modal foundation model across text, image and video
H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li
International Conference on Machine Learning, ICML, 23-29, 2023
148*2023
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
H Xu, M Yan, C Li, B Bi, S Huang, W Xiao, F Huang
ACL2021 Oral, 2021
1192021
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ...
EMNLP2023, 2023
1132023
mPLUG-DocOwl: Modularized multimodal large language model for document understanding
J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ...
arXiv preprint arXiv:2307.02499, 2023
1092023
Evaluation and analysis of hallucination in large vision-language models
J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu, Q Ye, M Yan, J Zhang, J Zhu, ...
arXiv preprint arXiv:2308.15126, 2023
1062023
Neural Topic Modeling with Bidirectional Adversarial Training
R Wang, X Hu, D Zhou, Y He, Y Xiong, C Ye, H Xu
ACL2020, 2020
1002020
mPLUG-DocOwl 1.5: Unified structure learning for ocr-free document understanding
A Hu, H Xu, J Ye, M Yan, L Zhang, B Zhang, C Li, J Zhang, Q Jin, F Huang, ...
EMNLP2024, 2024
802024
An llm-free multi-dimensional benchmark for mllms hallucination evaluation
J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia, H Xu, M Yan, J Zhang, ...
arXiv preprint arXiv:2311.07397, 2023
802023
Hitea: Hierarchical temporal-aware video-language pre-training
Q Ye, G Xu, M Yan, H Xu, Q Qian, J Zhang, F Huang
ICCV2023, 2022
752022
Hallucination augmented contrastive learning for multimodal large language model
C Jiang, H Xu, M Dong, J Chen, W Ye, M Yan, Q Ye, J Zhang, F Huang, ...
CVPR2024, 2023
702023
Mobile-Agent: Autonomous multi-modal mobile device agent with visual perception
J Wang, H Xu, J Ye, M Yan, W Shen, J Zhang, F Huang, J Sang
ICLR2024 Workshop on Large Language Model (LLM) Agents, 2024
662024
Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, et al. mplug: Effective and efficient vision-language learning by cross-modal skip-connections
C Li, H Xu, J Tian, W Wang, M Yan
arXiv preprint arXiv:2205.12005 1 (2), 2022
572022
mPLUG-Owl3: Towards long image-sequence understanding in multi-modal large language models
J Ye, H Xu, H Liu, A Hu, M Yan, Q Qian, J Zhang, F Huang, J Zhou
ICLR2025, 2024
492024
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Y Shi, X Yang, H Xu, C Yuan, B Li, W Hu, ZJ Zha
CVPR2022, 2021
412021
An unsupervised Bayesian modelling approach for storyline detection on news articles
D Zhou, H Xu, Y He
EMNLP2015, 1943-1948, 2015
312015
mPLUG-PaperOwl: Scientific diagram analysis with the multimodal large language model
A Hu, Y Shi, H Xu, J Ye, Q Ye, M Yan, C Li, Q Qian, J Zhang, F Huang
MM2024, 2023
302023
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–20