A comprehensive evaluation of gpt-4v on knowledge-intensive visual question answering Y Li, L Wang, B Hu, X Chen, W Zhong, C Lyu, M Zhang Technical Report, 2023 | 33 | 2023 |
LMEye: An Interactive Perception Network for Large Language Models Y Li, B Hu, X Chen, L Ma, Y Xu, M Zhang IEEE Transactions on Multimedia, 2024 | 32 | 2024 |
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Y Li, S Jiang, B Hu, L Wang, W Zhong, W Luo, L Ma, M Zhang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024 | 23 | 2024 |
Fast and robust online handwritten Chinese character recognition with deep spatial and contextual information fusion network Y Li, Q Yang, Q Chen, B Hu, X Wang, Y Ding, L Ma IEEE Transactions on Multimedia 25, 2140-2152, 2022 | 20 | 2022 |
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning Y Li, X Chen, B Hu, L Wang, H Shi, M Zhang arXiv preprint arXiv:2406.11303, 2024 | 17 | 2024 |
Medical Dialogue Response Generation with Pivotal Information Recalling Y Zhao*, Y Li*, Y Wu, B Hu, Q Chen, X Wang, Y Ding, M Zhang KDD 2022, 2022 | 16 | 2022 |
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context Y Li, B Hu, H Shi, W Wang, L Wang, M Zhang ICML 2024, 2024 | 15 | 2024 |
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues Y Li, B Hu, X Chen, Y Ding, L Ma, M Zhang ACL 2023 Main Conference, 2023 | 14 | 2023 |
Training Multimedia Event Extraction With Generated Images and Captions Z Du, Y Li, X Guo, Y Sun, B Li ACM MM 2023, 2023 | 10 | 2023 |
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations Q Yang*, Y Li*, B Hu, L Ma, Y Ding, M Zhang ACM MM 2022, 2022 | 10 | 2022 |
LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs Y Li, X Chen, B Hu, M Zhang arXiv preprint arXiv:2402.13546, 2024 | 7 | 2024 |
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text Y Li, B Hu, Y Ding, L Ma, M Zhang ACL 2023 Main Conference, 2023 | 7 | 2023 |
Glyphcrm: Bidirectional encoder representation for chinese character with its glyph Y Li, Y Zhao, B Hu, Q Chen, Y Xiang, X Wang, Y Ding, L Ma Technical Report, 2021 | 6 | 2021 |
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation Y Li, H Shi, B Hu, L Wang, J Zhu, J Xu, Z Zhao, M Zhang SIGGRAPH Asia 2024 Conference Papers, 1-11, 2024 | 3 | 2024 |
Towards vision enhancing llms: Empowering multimodal knowledge storage and sharing in llms Y Li, B Hu, W Wang, X Cao, M Zhang arXiv preprint arXiv:2311.15759, 2023 | 3 | 2023 |
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation Y Li, B Hu, W Luo, L Ma, Y Ding, M Zhang LREC-COLING 2024, 2024 | 2 | 2024 |
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Y Qin, Y Ye, J Fang, H Wang, S Liang, S Tian, J Zhang, J Li, Y Li, S Huang, ... arXiv preprint arXiv:2501.12326, 2025 | 1 | 2025 |
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment Y Li, X Chen, B Hu, H Shi, M Zhang ACL 2024 Main Conference, 2024 | 1 | 2024 |
A vision-language model with multi-granular knowledge fusion in medical imaging K Chen, Y Li, X Zhu, W Zhang, B Hu World Wide Web 28 (1), 1-21, 2025 | | 2025 |