Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation

Y Wang, T Huang, L Shen, H Yao, H Luo, R Liu… - arxiv preprint arxiv …, 2025 - arxiv.org
Harmful fine-tuning attack introduces significant security risks to the fine-tuning services.
Mainstream defenses aim to vaccinate the model such that the later harmful fine-tuning …

Defending LVLMs Against Vision Attacks through Partial-Perception Supervision

Q Zhou, T Li, Q Guo, D Wang, Y Lin, Y Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent studies have raised significant concerns regarding the vulnerability of Large Vision
Language Models (LVLMs) to maliciously injected or perturbed input images, which can …