- Academic Search

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org

The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Zapisz Cytuj Cytowane przez 55 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Don't Say No: Jailbreaking LLM by Suppressing Refusal

Y Zhou, Z Huang, F Lu, Z Qin, W Wang - arxiv preprint arxiv:2404.16369, 2024 - arxiv.org

Ensuring the safety alignment of Large Language Models (LLMs) is crucial to generating
responses consistent with human values. Despite their ability to recognize and avoid …

Zapisz Cytuj Cytowane przez 15 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gradient-based jailbreak images for multimodal fusion models

J Rando, H Korevaar, E Brinkman, I Evtimov… - arxiv preprint arxiv …, 2024 - arxiv.org

Augmenting language models with image inputs may enable more effective jailbreak attacks
through continuous optimization, unlike text inputs that require discrete optimization …

Zapisz Cytuj Cytowane przez 2 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks

H Wang, G Wang, H Zhang - arxiv preprint arxiv:2411.16721, 2024 - arxiv.org

Vision Language Models (VLMs) can produce unintended and harmful content when
exposed to adversarial attacks, particularly because their vision capabilities create new …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards the Worst-case Robustness of Large Language Models

H Chen, Y Dong, Z Wei, H Su, J Zhu - arxiv preprint arxiv:2501.19040, 2025 - arxiv.org

Recent studies have revealed the vulnerability of Large Language Models (LLMs) to
adversarial attacks, where the adversary crafts specific input sequences to induce harmful …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness

Z Wang, C **e, B Bartoldson, B Kailkhura - arxiv preprint arxiv …, 2025 - arxiv.org

This paper investigates the robustness of vision-language models against adversarial visual
perturbations and introduces a novel``double visual defense" to enhance this robustness …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

C Gu, J Gu, A Hua, Y Qin - openreview.net

Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained
attention for their capabilities in image recognition and understanding. However, while …

Zapisz Cytuj Powiązane artykuły Wersja HTML

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Privacy in large language models: Attacks, defenses and future directions

Don't Say No: Jailbreaking LLM by Suppressing Refusal

Gradient-based jailbreak images for multimodal fusion models

Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks

Towards the Worst-case Robustness of Large Language Models

Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness

Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack