Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - ACM Transactions on Software …, 2024 - dl.acm.org
The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

Weak-to-strong jailbreaking on large language models

X Zhao, X Yang, T Pang, C Du, L Li, YX Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Although significant efforts have been dedicated to aligning large language models (LLMs),
red-teaming reports suggest that these carefully aligned LLMs could still be jailbroken …

Escalation risks from language models in military and diplomatic decision-making

JP Rivera, G Mukobi, A Reuel, M Lamparth… - The 2024 ACM …, 2024 - dl.acm.org
Governments are increasingly considering integrating autonomous AI agents in high-stakes
military and foreign-policy decision-making, especially with the emergence of advanced …

Mission impossible: A statistical perspective on jailbreaking llms

J Su, J Kempe, K Ullrich - Advances in Neural Information …, 2025 - proceedings.neurips.cc
Large language models (LLMs) are trained on a deluge of text data with limited quality
control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as …

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs

X Zhao, L Li, YX Wang - arxiv preprint arxiv:2402.05864, 2024 - arxiv.org
In this paper, we propose a new decoding method called Permute-and-Flip (PF) decoder. It
enjoys robustness properties similar to the standard sampling decoder, but is provably up to …

Codechameleon: Personalized encryption framework for jailbreaking large language models

H Lv, X Wang, Y Zhang, C Huang, S Dou, J Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
Adversarial misuse, particularly throughjailbreaking'that circumvents a model's safety and
ethical protocols, poses a significant challenge for Large Language Models (LLMs). This …

Rapid optimization for jailbreaking llms via subconscious exploitation and echopraxia

G Shen, S Cheng, K Zhang, G Tao, S An, L Yan… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have become prevalent across diverse sectors,
transforming human life with their extraordinary reasoning and comprehension abilities. As …

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI

A Rawat, S Schoepf, G Zizzo, G Cornacchia… - arxiv preprint arxiv …, 2024 - arxiv.org
As generative AI, particularly large language models (LLMs), become increasingly
integrated into production applications, new attack surfaces and vulnerabilities emerge and …