Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

A Comprehensive Survey of Multimodal Large Language Models: Concept, Application and Safety

S Liu, W Pu, C Xu, Z Huang, Q Li, H Wang, C Lin… - 2024 - researchsquare.com
Recent advancements in MLLM, such as those exemplified by developments like GPT-4o,
have positioned them as a significant focus within the research community. MLLMs leverage …

CIEASR: Contextual Image-Enhanced Automatic Speech Recognition for Improved Homophone Discrimination

Z Wang, Y Rong, D Jiang, H Wu, S Zhou… - Proceedings of the 32nd …, 2024 - dl.acm.org
Automatic Speech Recognition (ASR) models pre-trained on large-scale speech datasets
have achieved significant breakthroughs compared with traditional methods. However …