Google Acadèmic

Desa Cita Citat per 12 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[PDF][PDF] Lazy safety alignment for large language models against harmful fine-tuning

T Huang, S Hu, F Ilhan, SF Tekin… - arxiv preprint arxiv …, 2024 - openreview.net

Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-
broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we …

Targeted vaccine: Safety alignment for large language models against harmful fine-tuning via layer-wise perturbation

G Liu, W Lin, T Huang, R Mo, Q Mu, L Shen - arxiv preprint arxiv …, 2024 - arxiv.org

Harmful fine-tuning attack poses a serious threat to the online fine-tuning service. Vaccine, a
recent alignment-stage defense, applies uniform perturbation to all layers of embedding to …

Desa Cita Citat per 5 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

BadJudge: Backdoor Vulnerabilities of LLM-As-A-Judge

T Tong, F Wang, Z Zhao, M Chen - The Thirteenth International …, 2025 - openreview.net

This paper exposes the backdoor threat in automatic evaluation with LLM-as-a-Judge. We
propose a novel threat model, where the adversary assumes control of both the candidate …

Desa Cita Citat per 1 Articles relacionats Versió HTML

Vaccine: Perturbation-aware alignment for large language models against harmful fine-tuning attack

T Huang, S Hu, L Liu - The Thirty-eighth Annual Conference on …, 2024 - openreview.net

The new paradigm of fine-tuning-as-a-service introduces a new attack surface for Large
Language Models (LLMs): a few harmful data uploaded by users can easily trick the fine …

Desa Cita Citat per 3 Articles relacionats Versió HTML

Lisa: Lazy safety alignment for large language models against harmful fine-tuning attack

T Huang, S Hu, F Ilhan, SF Tekin… - The Thirty-eighth Annual …, 2024 - openreview.net

Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-
broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature …

Desa Cita Citat per 2 Articles relacionats Versió HTML

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation

T Huang, S Hu, F Ilhan, SF Tekin, L Liu - arxiv preprint arxiv:2501.17433, 2025 - arxiv.org

Recent research shows that Large Language Models (LLMs) are vulnerable to harmful fine-
tuning attacks--models lose their safety alignment ability after fine-tuning on a few harmful …

Desa Cita Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

Pre-trained Graphformer-based Ranking at Web-scale Search

Y Li, H **ong, L Kong, Z Sun, H Chen, S Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Both Transformer and Graph Neural Networks (GNNs) have been employed in the domain
of learning to rank (LTR). However, these approaches adhere to two distinct yet …

Desa Cita Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML