Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Rethinking machine unlearning for large language models
We explore machine unlearning in the domain of large language models (LLMs), referred to
as LLM unlearning. This initiative aims to eliminate undesirable data influence (for example …
as LLM unlearning. This initiative aims to eliminate undesirable data influence (for example …
A comprehensive study of knowledge editing for large language models
Large Language Models (LLMs) have shown extraordinary capabilities in understanding
and generating text that closely mirrors human communication. However, a primary …
and generating text that closely mirrors human communication. However, a primary …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
From persona to personalization: A survey on role-playing language agents
Recent advancements in large language models (LLMs) have significantly boosted the rise
of Role-Playing Language Agents (RPLAs), ie, specialized AI systems designed to simulate …
of Role-Playing Language Agents (RPLAs), ie, specialized AI systems designed to simulate …
Defending against unforeseen failure modes with latent adversarial training
Despite extensive diagnostics and debugging by developers, AI systems sometimes exhibit
harmful unintended behaviors. Finding and fixing these is challenging because the attack …
harmful unintended behaviors. Finding and fixing these is challenging because the attack …
A causal explainable guardrails for large language models
Large Language Models (LLMs) have shown impressive performance in natural language
tasks, but their outputs can exhibit undesirable attributes or biases. Existing methods for …
tasks, but their outputs can exhibit undesirable attributes or biases. Existing methods for …
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …
Securing large language models: Threats, vulnerabilities and responsible practices
Large language models (LLMs) have significantly transformed the landscape of Natural
Language Processing (NLP). Their impact extends across a diverse spectrum of tasks …
Language Processing (NLP). Their impact extends across a diverse spectrum of tasks …
On the vulnerability of safety alignment in open-access llms
Large language models (LLMs) possess immense capabilities but are susceptible to
malicious exploitation. To mitigate the risk, safety alignment is employed to align LLMs with …
malicious exploitation. To mitigate the risk, safety alignment is employed to align LLMs with …
Soul: Unlocking the power of second-order optimization for llm unlearning
Large Language Models (LLMs) have highlighted the necessity of effective unlearning
mechanisms to comply with data regulations and ethical AI practices. LLM unlearning aims …
mechanisms to comply with data regulations and ethical AI practices. LLM unlearning aims …