Privacy in large language models: Attacks, defenses and future directions
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …
effectively tackle various downstream NLP tasks and unify these tasks into generative …
Don't Say No: Jailbreaking LLM by Suppressing Refusal
Ensuring the safety alignment of Large Language Models (LLMs) is crucial to generating
responses consistent with human values. Despite their ability to recognize and avoid …
responses consistent with human values. Despite their ability to recognize and avoid …
Gradient-based jailbreak images for multimodal fusion models
Augmenting language models with image inputs may enable more effective jailbreak attacks
through continuous optimization, unlike text inputs that require discrete optimization …
through continuous optimization, unlike text inputs that require discrete optimization …
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
Vision Language Models (VLMs) can produce unintended and harmful content when
exposed to adversarial attacks, particularly because their vision capabilities create new …
exposed to adversarial attacks, particularly because their vision capabilities create new …
Towards the Worst-case Robustness of Large Language Models
Recent studies have revealed the vulnerability of Large Language Models (LLMs) to
adversarial attacks, where the adversary crafts specific input sequences to induce harmful …
adversarial attacks, where the adversary crafts specific input sequences to induce harmful …
Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness
This paper investigates the robustness of vision-language models against adversarial visual
perturbations and introduces a novel``double visual defense" to enhance this robustness …
perturbations and introduces a novel``double visual defense" to enhance this robustness …
Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack
C Gu, J Gu, A Hua, Y Qin - openreview.net
Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained
attention for their capabilities in image recognition and understanding. However, while …
attention for their capabilities in image recognition and understanding. However, while …