- Academic Search

Artykuły

Scholar

1 wynik (0,01 s)

Mój profil Moja biblioteka

Open Problems in Machine Unlearning for AI Safety

Szukaj w artykułach zawierających cytaty

[Free GPT-4]

[PDF] arxiv.org

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation

T Huang, S Hu, F Ilhan, SF Tekin, L Liu - arxiv preprint arxiv:2501.17433, 2025 - arxiv.org

Recent research shows that Large Language Models (LLMs) are vulnerable to harmful fine-
tuning attacks--models lose their safety alignment ability after fine-tuning on a few harmful …

Zapisz Cytuj Powiązane artykuły Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Open Problems in Machine Unlearning for AI Safety

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation