Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Open problems in machine unlearning for ai safety
As AI systems become more capable, widely deployed, and increasingly autonomous in
critical areas such as cybersecurity, biological research, and healthcare, ensuring their …
critical areas such as cybersecurity, biological research, and healthcare, ensuring their …
Steering language model refusal with sparse autoencoders
K O'Brien, D Majercak, X Fernandes, R Edgar… - arxiv preprint arxiv …, 2024 - arxiv.org
Responsible practices for deploying language models include guiding models to recognize
and refuse answering prompts that are considered unsafe, while complying with safe …
and refuse answering prompts that are considered unsafe, while complying with safe …
Enhancing Automated Interpretability with Output-Centric Feature Descriptions
Automated interpretability pipelines generate natural language descriptions for the concepts
represented by features in large language models (LLMs), such as plants or the first word in …
represented by features in large language models (LLMs), such as plants or the first word in …
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
B Cywiński, K Deja - arxiv preprint arxiv:2501.18052, 2025 - arxiv.org
Recent machine unlearning approaches offer promising solution for removing unwanted
concepts from diffusion models. However, traditional methods, which largely rely on fine …
concepts from diffusion models. However, traditional methods, which largely rely on fine …