Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org
With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Phi-3 technical report: A highly capable language model locally on your phone

M Abdin, J Aneja, H Awadalla, A Awadallah… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion
tokens, whose overall performance, as measured by both academic benchmarks and …

Images are achilles' heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models

Y Li, H Guo, K Zhou, WX Zhao, JR Wen - European Conference on …, 2024 - Springer
In this paper, we study the harmlessness alignment problem of multimodal large language
models (MLLMs). We conduct a systematic empirical analysis of the harmlessness …

Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting

Y Wang, X Liu, Y Li, M Chen, C **ao - European Conference on Computer …, 2024 - Springer
With the advent and widespread deployment of Multimodal Large Language Models
(MLLMs), the imperative to ensure their safety has become increasingly pronounced …

Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation

Y Gou, K Chen, Z Liu, L Hong, H Xu, Z Li… - … on Computer Vision, 2024 - Springer
Multimodal large language models (MLLMs) have shown impressive reasoning abilities.
However, they are also more vulnerable to jailbreak attacks than their LLM predecessors …

A survey of attacks on large vision-language models: Resources, advances, and future trends

D Liu, M Yang, X Qu, P Zhou, Y Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
With the significant development of large models in recent years, Large Vision-Language
Models (LVLMs) have demonstrated remarkable capabilities across a wide range of …

A survey on evaluation of multimodal large language models

J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

Arondight: Red teaming large vision language models with auto-generated multi-modal jailbreak prompts

Y Liu, C Cai, X Zhang, X Yuan, C Wang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Large Vision Language Models (VLMs) extend and enhance the perceptual abilities of
Large Language Models (LLMs). Despite offering new possibilities for LLM applications …