Grounding and evaluation for large language models: Practical challenges and lessons learned (survey)
With the ongoing rapid adoption of Artificial Intelligence (AI)-based systems in high-stakes
domains, ensuring the trustworthiness, safety, and observability of these systems has …
domains, ensuring the trustworthiness, safety, and observability of these systems has …
Threats, attacks, and defenses in machine unlearning: A survey
Machine Unlearning (MU) has recently gained considerable attention due to its potential to
achieve Safe AI by removing the influence of specific data from trained Machine Learning …
achieve Safe AI by removing the influence of specific data from trained Machine Learning …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
On protecting the data privacy of large language models (llms): A survey
Large language models (LLMs) are complex artificial intelligence systems capable of
understanding, generating and translating human language. They learn language patterns …
understanding, generating and translating human language. They learn language patterns …
Privacy in large language models: Attacks, defenses and future directions
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …
effectively tackle various downstream NLP tasks and unify these tasks into generative …
Muse: Machine unlearning six-way evaluation for language models
Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …
and copyrighted content. Data owners may request the removal of their data from a trained …
Guardrail baselines for unlearning in llms
Recent work has demonstrated that finetuning is a promising approach to'unlearn'concepts
from large language models. However, finetuning can be expensive, as it requires both …
from large language models. However, finetuning can be expensive, as it requires both …
Tamper-resistant safeguards for open-weight llms
Rapid advances in the capabilities of large language models (LLMs) have raised
widespread concerns regarding their potential for malicious use. Open-weight LLMs present …
widespread concerns regarding their potential for malicious use. Open-weight LLMs present …
Challenging forgets: Unveiling the worst-case forget sets in machine unlearning
The trustworthy machine learning (ML) community is increasingly recognizing the crucial
need for models capable of selectively 'unlearning'data points after training. This leads to the …
need for models capable of selectively 'unlearning'data points after training. This leads to the …
An adversarial perspective on machine unlearning for ai safety
Large language models are finetuned to refuse questions about hazardous knowledge, but
these protections can often be bypassed. Unlearning methods aim at completely removing …
these protections can often be bypassed. Unlearning methods aim at completely removing …