A survey of attacks on large vision-language models: Resources, advances, and future trends

D Liu, M Yang, X Qu, P Zhou, Y Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
With the significant development of large models in recent years, Large Vision-Language
Models (LVLMs) have demonstrated remarkable capabilities across a wide range of …

Unbridled icarus: A survey of the potential perils of image inputs in multimodal large language model security

Y Fan, Y Cao, Z Zhao, Z Liu, S Li - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities that
increasingly influence various aspects of our daily lives, constantly defining the new …

Artificial intelligence for biomedical video generation

L Li, J Qiu, A Saha, L Li, P Li, M He, Z Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
As a prominent subfield of Artificial Intelligence Generated Content (AIGC), video generation
has achieved notable advancements in recent years. The introduction of Sora-alike models …

Few-Shot Adversarial Prompt Learning on Vision-Language Models

Y Zhou, X **a, Z Lin, B Han, T Liu - arxiv preprint arxiv:2403.14774, 2024 - arxiv.org
The vulnerability of deep neural networks to imperceptible adversarial perturbations has
attracted widespread attention. Inspired by the success of vision-language foundation …

Adversarial Attacks of Vision Tasks in the Past 10 Years: A Survey

C Zhang, X Xu, J Wu, Z Liu, L Zhou - arxiv preprint arxiv:2410.23687, 2024 - arxiv.org
Adversarial attacks, which manipulate input data to undermine model availability and
integrity, pose significant security threats during machine learning inference. With the advent …

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents

Y Gan, Y Yang, Z Ma, P He, R Zeng, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
With the continuous development of large language models (LLMs), transformer-based
models have made groundbreaking advances in numerous natural language processing …

B-AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Black-box Adversarial Visual-Instructions

H Zhang, W Shao, H Liu, Y Ma, P Luo… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Large Vision-Language Models (LVLMs) have shown significant progress in responding
well to visual-instructions from users. However, these instructions, encompassing images …

Adversarial Prompt Distillation for Vision-Language Models

L Luo, X Wang, B Zi, S Zhao, X Ma - arxiv preprint arxiv:2411.15244, 2024 - arxiv.org
Large pre-trained Vision-Language Models (VLMs) such as Contrastive Language-Image
Pre-Training (CLIP) have been shown to be susceptible to adversarial attacks, raising …

Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Z Zhang, S He, B Shen, L Feng - arxiv preprint arxiv:2412.20392, 2024 - arxiv.org
Multimodal contrastive learning models (eg, CLIP) can learn high-quality representations
from large-scale image-text datasets, yet they exhibit significant vulnerabilities to backdoor …

[HTML][HTML] MDAPT: Multi-Modal Depth Adversarial Prompt Tuning to Enhance the Adversarial Robustness of Visual Language Models

C Li, Y Liao, C Ding, Z Ye - Sensors, 2025 - mdpi.com
Large visual language models like Contrastive Language-Image Pre-training (CLIP), despite
their excellent performance, are highly vulnerable to the influence of adversarial examples …