[HTML][HTML] When llms meet cybersecurity: A systematic literature review

J Zhang, H Bu, H Wen, Y Liu, H Fei… - …, 2025 - cybersecurity.springeropen.com
The rapid development of large language models (LLMs) has opened new avenues across
various fields, including cybersecurity, which faces an evolving threat landscape and …

Threats, attacks, and defenses in machine unlearning: A survey

Z Liu, H Ye, C Chen, Y Zheng, KY Lam - arxiv preprint arxiv:2403.13682, 2024 - arxiv.org
Machine Unlearning (MU) has recently gained considerable attention due to its potential to
achieve Safe AI by removing the influence of specific data from trained Machine Learning …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arxiv preprint arxiv …, 2024 - arxiv.org
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

Muse: Machine unlearning six-way evaluation for language models

W Shi, J Lee, Y Huang, S Malladi, J Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …

Guardrail baselines for unlearning in llms

P Thaker, Y Maurya, S Hu, ZS Wu, V Smith - arxiv preprint arxiv …, 2024 - arxiv.org
Recent work has demonstrated that finetuning is a promising approach to'unlearn'concepts
from large language models. However, finetuning can be expensive, as it requires both …

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

Open problems in technical ai governance

A Reuel, B Bucknall, S Casper, T Fist, L Soder… - arxiv preprint arxiv …, 2024 - arxiv.org
AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …

Machine unlearning in generative ai: A survey

Z Liu, G Dou, Z Tan, Y Tian, M Jiang - arxiv preprint arxiv:2407.20516, 2024 - arxiv.org
Generative AI technologies have been deployed in many places, such as (multimodal) large
language models and vision generative models. Their remarkable performance should be …

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arxiv preprint arxiv …, 2024 - arxiv.org
This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …