Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation
Recent research shows that Large Language Models (LLMs) are vulnerable to harmful fine-
tuning attacks--models lose their safety alignment ability after fine-tuning on a few harmful …
tuning attacks--models lose their safety alignment ability after fine-tuning on a few harmful …