Unbridled icarus: A survey of the potential perils of image inputs in multimodal large language model security
Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities that
increasingly influence various aspects of our daily lives, constantly defining the new …
increasingly influence various aspects of our daily lives, constantly defining the new …
Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models
The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …
Instructta: Instruction-tuned targeted attack for large vision-language models
Large vision-language models (LVLMs) have demonstrated their incredible capability in
image understanding and response generation. However, this rich visual interaction also …
image understanding and response generation. However, this rich visual interaction also …
Test-time backdoor attacks on multimodal large language models
Backdoor attacks are commonly executed by contaminating training data, such that a trigger
can activate predetermined harmful effects during the test phase. In this work, we present …
can activate predetermined harmful effects during the test phase. In this work, we present …
Sa-attack: Improving adversarial transferability of vision-language pre-training models via self-augmentation
Current Visual-Language Pre-training (VLP) models are vulnerable to adversarial examples.
These adversarial examples present substantial security risks to VLP models, as they can …
These adversarial examples present substantial security risks to VLP models, as they can …
Ot-attack: Enhancing adversarial transferability of vision-language models via optimal transport optimization
Vision-language pre-training (VLP) models demonstrate impressive abilities in processing
both images and text. However, they are vulnerable to multi-modal adversarial examples …
both images and text. However, they are vulnerable to multi-modal adversarial examples …
Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast
A multimodal large language model (MLLM) agent can receive instructions, capture images,
retrieve histories from memory, and decide which tools to use. Nonetheless, red-teaming …
retrieve histories from memory, and decide which tools to use. Nonetheless, red-teaming …
Probing the robustness of vision-language pretrained models: A multimodal adversarial attack approach
Vision-language pretraining (VLP) with transformers has demonstrated exceptional
performance across numerous multimodal tasks. However, the adversarial robustness of …
performance across numerous multimodal tasks. However, the adversarial robustness of …
Multimodal large model pretraining, adaptation and efficiency optimization
L Ji, S **ao, J Feng, W Gao, H Zhang - Neurocomputing, 2025 - Elsevier
Multimodal large models, leveraging extensive datasets and parameters, have provided
superior solutions for multimodal tasks and have been widely applied across various …
superior solutions for multimodal tasks and have been widely applied across various …
VADS: Visuo-Adaptive DualStrike attack on visual question answer
Abstract Visual Question Answering (VQA) is a fundamental task in computer vision and
natural language process fields. The adversarial vulnerability of VQA models is crucial for …
natural language process fields. The adversarial vulnerability of VQA models is crucial for …