Grounding and evaluation for large language models: Practical challenges and lessons learned (survey)

K Kenthapadi, M Sameki, A Taly - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org
With the ongoing rapid adoption of Artificial Intelligence (AI)-based systems in high-stakes
domains, ensuring the trustworthiness, safety, and observability of these systems has …

Threats, attacks, and defenses in machine unlearning: A survey

Z Liu, H Ye, C Chen, Y Zheng, KY Lam - arxiv preprint arxiv:2403.13682, 2024 - arxiv.org
Machine Unlearning (MU) has recently gained considerable attention due to its potential to
achieve Safe AI by removing the influence of specific data from trained Machine Learning …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

On protecting the data privacy of large language models (llms): A survey

B Yan, K Li, M Xu, Y Dong, Y Zhang, Z Ren… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are complex artificial intelligence systems capable of
understanding, generating and translating human language. They learn language patterns …

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Muse: Machine unlearning six-way evaluation for language models

W Shi, J Lee, Y Huang, S Malladi, J Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …

Guardrail baselines for unlearning in llms

P Thaker, Y Maurya, S Hu, ZS Wu, V Smith - arxiv preprint arxiv …, 2024 - arxiv.org
Recent work has demonstrated that finetuning is a promising approach to'unlearn'concepts
from large language models. However, finetuning can be expensive, as it requires both …

Tamper-resistant safeguards for open-weight llms

R Tamirisa, B Bharathi, L Phan, A Zhou, A Gatti… - arxiv preprint arxiv …, 2024 - arxiv.org
Rapid advances in the capabilities of large language models (LLMs) have raised
widespread concerns regarding their potential for malicious use. Open-weight LLMs present …

Challenging forgets: Unveiling the worst-case forget sets in machine unlearning

C Fan, J Liu, A Hero, S Liu - European Conference on Computer Vision, 2024 - Springer
The trustworthy machine learning (ML) community is increasingly recognizing the crucial
need for models capable of selectively 'unlearning'data points after training. This leads to the …

An adversarial perspective on machine unlearning for ai safety

J Łucki, B Wei, Y Huang, P Henderson… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models are finetuned to refuse questions about hazardous knowledge, but
these protections can often be bypassed. Unlearning methods aim at completely removing …