A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024‏ - dl.acm.org
With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration

F Fui-Hoon Nah, R Zheng, J Cai, K Siau… - Journal of information …, 2023‏ - Taylor & Francis
Artificial intelligence (AI) has elicited much attention across disciplines and industries (Hyder
et al., 2019). AI has been defined as “a system's ability to correctly interpret external data, to …

Sharegpt4v: Improving large multi-modal models with better captions

L Chen, J Li, X Dong, P Zhang, C He, J Wang… - … on Computer Vision, 2024‏ - Springer
Modality alignment serves as the cornerstone for large multi-modal models (LMMs).
However, the impact of different attributes (eg, data type, quality, and scale) of training data …

Vila: On pre-training for visual language models

J Lin, H Yin, W **, P Molchanov… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024‏ - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Chat-univi: Unified visual representation empowers large language models with image and video understanding

P **, R Takanobu, W Zhang… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Large language models have demonstrated impressive universal capabilities across a wide
range of open-ended tasks and have extended their utility to encompass multimodal …

Are we on the right way for evaluating large vision-language models?

L Chen, J Li, X Dong, P Zhang, Y Zang, Z Chen… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large vision-language models (LVLMs) have recently achieved rapid progress, sparking
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …

Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T **ao, T He, Z Han, Z Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

How robust is google's bard to adversarial image attacks?

Y Dong, H Chen, J Chen, Z Fang, X Yang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Multimodal Large Language Models (MLLMs) that integrate text and other modalities
(especially vision) have achieved unprecedented performance in various multimodal tasks …