Yunxin Li

Cited by

	All	Since 2020
Citations	220	220
h-index	10	10
i10-index	10	10

160

120

20222023202420254 44 150 21

Public access

View all

4 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Baotian HuHarbin Institute of Technology (Shenzhen)Verified email at hit.edu.cn
Min ZhangProfessor of Computer Science, Soochow UniversityVerified email at suda.edu.cn
Lin MaMeituanVerified email at alumni.cuhk.net
Longyue WangAlibaba GroupVerified email at alibaba-inc.com
Wenhan LuoAssociate Professor, HKUSTVerified email at ust.hk
Chenyang LyuMBZUAIVerified email at mbzuai.ac.ae
Yong XuBio-Computing Research Center, Harbin Institute of Technology, ShenzhenVerified email at hitsz.edu.cn
Yang XiangPeng Cheng Laboratory, ChinaVerified email at pcl.ac.cn
Xiaochun CaoSun Yat-sen UniversityVerified email at mail.sysu.edu.cn
Yuxiang WuUniversity College London | Weco AIVerified email at ucl.ac.uk

Yunxin Li

Harbin Institute of Technology (Shenzhen)

Verified email at stu.hit.edu.cn - Homepage

Multimodal Reasoning Large Models AI Agents


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
A comprehensive evaluation of gpt-4v on knowledge-intensive visual question answering Y Li, L Wang, B Hu, X Chen, W Zhong, C Lyu, M Zhang Technical Report, 2023	33	2023
LMEye: An Interactive Perception Network for Large Language Models Y Li, B Hu, X Chen, L Ma, Y Xu, M Zhang IEEE Transactions on Multimedia, 2024	32	2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Y Li, S Jiang, B Hu, L Wang, W Zhong, W Luo, L Ma, M Zhang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024	23	2024
Fast and robust online handwritten Chinese character recognition with deep spatial and contextual information fusion network Y Li, Q Yang, Q Chen, B Hu, X Wang, Y Ding, L Ma IEEE Transactions on Multimedia 25, 2140-2152, 2022	20	2022
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning Y Li, X Chen, B Hu, L Wang, H Shi, M Zhang arXiv preprint arXiv:2406.11303, 2024	17	2024
Medical Dialogue Response Generation with Pivotal Information Recalling Y Zhao, Y Li, Y Wu, B Hu, Q Chen, X Wang, Y Ding, M Zhang KDD 2022, 2022	16	2022
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context Y Li, B Hu, H Shi, W Wang, L Wang, M Zhang ICML 2024, 2024	15	2024
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues Y Li, B Hu, X Chen, Y Ding, L Ma, M Zhang ACL 2023 Main Conference, 2023	14	2023
Training Multimedia Event Extraction With Generated Images and Captions Z Du, Y Li, X Guo, Y Sun, B Li ACM MM 2023, 2023	10	2023
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations Q Yang, Y Li, B Hu, L Ma, Y Ding, M Zhang ACM MM 2022, 2022	10	2022
LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs Y Li, X Chen, B Hu, M Zhang arXiv preprint arXiv:2402.13546, 2024	7	2024
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text Y Li, B Hu, Y Ding, L Ma, M Zhang ACL 2023 Main Conference, 2023	7	2023
Glyphcrm: Bidirectional encoder representation for chinese character with its glyph Y Li, Y Zhao, B Hu, Q Chen, Y Xiang, X Wang, Y Ding, L Ma Technical Report, 2021	6	2021
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation Y Li, H Shi, B Hu, L Wang, J Zhu, J Xu, Z Zhao, M Zhang SIGGRAPH Asia 2024 Conference Papers, 1-11, 2024	3	2024
Towards vision enhancing llms: Empowering multimodal knowledge storage and sharing in llms Y Li, B Hu, W Wang, X Cao, M Zhang arXiv preprint arXiv:2311.15759, 2023	3	2023
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation Y Li, B Hu, W Luo, L Ma, Y Ding, M Zhang LREC-COLING 2024, 2024	2	2024
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Y Qin, Y Ye, J Fang, H Wang, S Liang, S Tian, J Zhang, J Li, Y Li, S Huang, ... arXiv preprint arXiv:2501.12326, 2025	1	2025
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment Y Li, X Chen, B Hu, H Shi, M Zhang ACL 2024 Main Conference, 2024	1	2024
A vision-language model with multi-granular knowledge fusion in medical imaging K Chen, Y Li, X Zhu, W Zhang, B Hu World Wide Web 28 (1), 1-21, 2025		2025

The system can't perform the operation now. Try again later.

Articles 1–19

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors