Študovňa Google

J Łucki, B Wei, Y Huang, P Henderson… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models are finetuned to refuse questions about hazardous knowledge, but
these protections can often be bypassed. Unlearning methods aim at completely removing …

Uložiť Citovať Citované 18-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open problems in machine unlearning for ai safety

F Barez, T Fu, A Prabhu, S Casper, A Sanyal… - arxiv preprint arxiv …, 2025 - arxiv.org

As AI systems become more capable, widely deployed, and increasingly autonomous in
critical areas such as cybersecurity, biological research, and healthcare, ensuring their …

Uložiť Citovať Citované 8-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Position: Llm unlearning benchmarks are weak measures of progress

P Thaker, S Hu, N Kale, Y Maurya, ZS Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Unlearning methods have the potential to improve the privacy and safety of large language
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …

Uložiť Citovať Citované 6-krát Súvisiace články Všetky verzie 2 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models

H Chen, S Szyller, W Xu, N Himayat - arxiv preprint arxiv:2502.15836, 2025 - arxiv.org

Large language models (LLMs) have become increasingly popular. Their emergent
capabilities can be attributed to their massive training datasets. However, these datasets …

Uložiť Citovať Súvisiace články HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A General Framework to Enhance Fine-tuning-based LLM Unlearning

J Ren, Z Dai, X Tang, H Liu, J Zeng, Z Li… - arxiv preprint arxiv …, 2025 - arxiv.org

Unlearning has been proposed to remove copyrighted and privacy-sensitive data from
Large Language Models (LLMs). Existing approaches primarily rely on fine-tuning-based …

Uložiť Citovať Súvisiace články HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Rethinking The Reliability of Representation Engineering in Large Language Models

Z Deng, J Jiang, G Long, C Zhang - openreview.net

Inspired by cognitive neuroscience, representation engineering (RepE) seeks to connect the
neural activities within large language models (LLMs) to their behaviors, providing a …

Uložiť Citovať Súvisiace články HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

On effects of steering latent representation for large language model unlearning

An adversarial perspective on machine unlearning for ai safety

Open problems in machine unlearning for ai safety

Position: Llm unlearning benchmarks are weak measures of progress

Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models

A General Framework to Enhance Fine-tuning-based LLM Unlearning

Rethinking The Reliability of Representation Engineering in Large Language Models