Harmful fine-tuning attacks and defenses for large language models: A survey
Recent research demonstrates that the nascent fine-tuning-as-a-service business model
exposes serious safety concerns--fine-tuning over a few harmful data uploaded by the users …
exposes serious safety concerns--fine-tuning over a few harmful data uploaded by the users …
[PDF][PDF] Lazy safety alignment for large language models against harmful fine-tuning
Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-
broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we …
broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we …
Targeted vaccine: Safety alignment for large language models against harmful fine-tuning via layer-wise perturbation
Harmful fine-tuning attack poses a serious threat to the online fine-tuning service. Vaccine, a
recent alignment-stage defense, applies uniform perturbation to all layers of embedding to …
recent alignment-stage defense, applies uniform perturbation to all layers of embedding to …
BadJudge: Backdoor Vulnerabilities of LLM-As-A-Judge
This paper exposes the backdoor threat in automatic evaluation with LLM-as-a-Judge. We
propose a novel threat model, where the adversary assumes control of both the candidate …
propose a novel threat model, where the adversary assumes control of both the candidate …
Vaccine: Perturbation-aware alignment for large language models against harmful fine-tuning attack
The new paradigm of fine-tuning-as-a-service introduces a new attack surface for Large
Language Models (LLMs): a few harmful data uploaded by users can easily trick the fine …
Language Models (LLMs): a few harmful data uploaded by users can easily trick the fine …
Lisa: Lazy safety alignment for large language models against harmful fine-tuning attack
Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-
broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature …
broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature …
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation
Recent research shows that Large Language Models (LLMs) are vulnerable to harmful fine-
tuning attacks--models lose their safety alignment ability after fine-tuning on a few harmful …
tuning attacks--models lose their safety alignment ability after fine-tuning on a few harmful …
Pre-trained Graphformer-based Ranking at Web-scale Search
Both Transformer and Graph Neural Networks (GNNs) have been employed in the domain
of learning to rank (LTR). However, these approaches adhere to two distinct yet …
of learning to rank (LTR). However, these approaches adhere to two distinct yet …
Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale
Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages
from retrieved content based on input queries. However, traditional LTR models encounter …
from retrieved content based on input queries. However, traditional LTR models encounter …