Locking down the finetuned llms safety

M Zhu, L Yang, Y Wei, N Zhang, Y Zhang - arxiv preprint arxiv …, 2024 - arxiv.org
Fine-tuning large language models (LLMs) on additional datasets is often necessary to
optimize them for specific downstream tasks. However, existing safety alignment measures …

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

J Yang, D **, A Tang, L Shen, D Zhu, Z Chen… - arxiv preprint arxiv …, 2025 - arxiv.org
Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness,
Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI …