Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Rethinking machine unlearning for large language models
We explore machine unlearning in the domain of large language models (LLMs), referred to
as LLM unlearning. This initiative aims to eliminate undesirable data influence (for example …
as LLM unlearning. This initiative aims to eliminate undesirable data influence (for example …
Threats, attacks, and defenses in machine unlearning: A survey
Machine Unlearning (MU) has recently gained considerable attention due to its potential to
achieve Safe AI by removing the influence of specific data from trained Machine Learning …
achieve Safe AI by removing the influence of specific data from trained Machine Learning …
Rethinking llm memorization through the lens of adversarial compression
Large language models (LLMs) trained on web-scale datasets raise substantial concerns
regarding permissible data usage. One major question is whether these models" memorize" …
regarding permissible data usage. One major question is whether these models" memorize" …
Large language model unlearning via embedding-corrupted prompts
Large language models (LLMs) have advanced to encompass extensive knowledge across
diverse domains. Yet controlling what a large language model should not know is important …
diverse domains. Yet controlling what a large language model should not know is important …
Negative preference optimization: From catastrophic collapse to effective unlearning
Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data
during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from …
during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from …
Muse: Machine unlearning six-way evaluation for language models
Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …
and copyrighted content. Data owners may request the removal of their data from a trained …
Stress-testing capability elicitation with password-locked models
To determine the safety of large language models (LLMs), AI developers must be able to
assess their dangerous capabilities. But simple prompting strategies often fail to elicit an …
assess their dangerous capabilities. But simple prompting strategies often fail to elicit an …
What makes and breaks safety fine-tuning? a mechanistic study
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for
their safe deployment. To better understand the underlying factors that make models safe via …
their safe deployment. To better understand the underlying factors that make models safe via …
Guardrail baselines for unlearning in llms
Recent work has demonstrated that finetuning is a promising approach to'unlearn'concepts
from large language models. However, finetuning can be expensive, as it requires both …
from large language models. However, finetuning can be expensive, as it requires both …
Tamper-resistant safeguards for open-weight llms
Rapid advances in the capabilities of large language models (LLMs) have raised
widespread concerns regarding their potential for malicious use. Open-weight LLMs present …
widespread concerns regarding their potential for malicious use. Open-weight LLMs present …