Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - Nature Machine …, 2025‏ - nature.com
We explore machine unlearning in the domain of large language models (LLMs), referred to
as LLM unlearning. This initiative aims to eliminate undesirable data influence (for example …

Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2024‏ - Wiley Online Library
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Can llm-generated misinformation be detected?

C Chen, K Shu - arxiv preprint arxiv:2309.13788, 2023‏ - arxiv.org
The advent of Large Language Models (LLMs) has made a transformative impact. However,
the potential that LLMs such as ChatGPT can be exploited to generate misinformation has …

Survey of vulnerabilities in large language models revealed by adversarial attacks

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

Lmsys-chat-1m: A large-scale real-world llm conversation dataset

L Zheng, WL Chiang, Y Sheng, T Li, S Zhuang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Studying how people interact with large language models (LLMs) in real-world scenarios is
increasingly important due to their widespread use in various applications. In this paper, we …

Raising the Bar of AI-generated Image Detection with CLIP

D Cozzolino, G Poggi, R Corvi… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs)
for universal detection of AI-generated images. We develop a lightweight detection strategy …

Soft prompt threats: Attacking safety alignment and unlearning in open-source llms through the embedding space

L Schwinn, D Dobre, S Xhonneux… - Advances in …, 2025‏ - proceedings.neurips.cc
Current research in adversarial robustness of LLMs focuses on\textit {discrete} input
manipulations in the natural language space, which can be directly transferred to\textit …

Removing rlhf protections in gpt-4 via fine-tuning

Q Zhan, R Fang, R Bindu, A Gupta, T Hashimoto… - arxiv preprint arxiv …, 2023‏ - arxiv.org
As large language models (LLMs) have increased in their capabilities, so does their
potential for dual use. To reduce harmful outputs, produces and vendors of LLMs have used …

Generative Artificial Intelligence for Software Engineering--A Research Agenda

A Nguyen-Duc, B Cabrero-Daniel, A Przybylek… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Generative Artificial Intelligence (GenAI) tools have become increasingly prevalent in
software development, offering assistance to various managerial and technical project …

Knowledge conflicts for llms: A survey

R Xu, Z Qi, Z Guo, C Wang, H Wang, Y Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
This survey provides an in-depth analysis of knowledge conflicts for large language models
(LLMs), highlighting the complex challenges they encounter when blending contextual and …