- Academic Search

שמור צטט צוטט על ידי 26 מאמרים בנושא זה כל 2 הגרסאות

[PDF] ieee.org

Threats, attacks, and defenses in machine unlearning: A survey‏

Z Liu, H Ye, C Chen, Y Zheng… - IEEE Open Journal of the …, 2025‏ - ieeexplore.ieee.org‏

Machine Unlearning (MU) has recently gained considerable attention due to its potential to
achieve Safe AI by removing the influence of specific data from trained Machine Learning …‏

שמור צטט צוטט על ידי 37 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

Rethinking llm memorization through the lens of adversarial compression‏

A Schwarzschild, Z Feng, P Maini… - Advances in Neural …, 2025‏ - proceedings.neurips.cc‏

Large language models (LLMs) trained on web-scale datasets raise substantial concerns
regarding permissible data usage. One major question is whether these models" memorize" …‏

שמור צטט צוטט על ידי 15 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

Large language model unlearning via embedding-corrupted prompts‏

C Liu, Y Wang, J Flanigan, Y Liu - Advances in Neural …, 2025‏ - proceedings.neurips.cc‏

Large language models (LLMs) have advanced to encompass extensive knowledge across
diverse domains. Yet controlling what a large language model should not know is important …‏

שמור צטט צוטט על ידי 79 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

Negative preference optimization: From catastrophic collapse to effective unlearning‏

R Zhang, L Lin, Y Bai, S Mei - arxiv preprint arxiv:2404.05868, 2024‏ - arxiv.org‏

Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data
during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from …‏

שמור צטט צוטט על ידי 37 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

Muse: Machine unlearning six-way evaluation for language models‏

W Shi, J Lee, Y Huang, S Malladi, J Zhao… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …‏

שמור צטט צוטט על ידי 12 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

Stress-testing capability elicitation with password-locked models‏

R Greenblatt, F Roger… - Advances in Neural …, 2025‏ - proceedings.neurips.cc‏

To determine the safety of large language models (LLMs), AI developers must be able to
assess their dangerous capabilities. But simple prompting strategies often fail to elicit an …‏

שמור צטט צוטט על ידי 10 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

What makes and breaks safety fine-tuning? a mechanistic study‏

S Jain, ES Lubana, K Oksuz, T Joy… - Advances in …, 2025‏ - proceedings.neurips.cc‏

Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for
their safe deployment. To better understand the underlying factors that make models safe via …‏

שמור צטט צוטט על ידי 25 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

Guardrail baselines for unlearning in llms‏

P Thaker, Y Maurya, S Hu, ZS Wu, V Smith - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Recent work has demonstrated that finetuning is a promising approach to'unlearn'concepts
from large language models. However, finetuning can be expensive, as it requires both …‏