Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Weak-to-strong jailbreaking on large language models

X Zhao, X Yang, T Pang, C Du, L Li, YX Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are vulnerable to jailbreak attacks-resulting in harmful,
unethical, or biased text generations. However, existing jailbreaking methods are …

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - ACM Transactions on Software …, 2024 - dl.acm.org
The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …

Escalation risks from language models in military and diplomatic decision-making

JP Rivera, G Mukobi, A Reuel, M Lamparth… - Proceedings of the …, 2024 - dl.acm.org
Governments are increasingly considering integrating autonomous AI agents in high-stakes
military and foreign-policy decision-making, especially with the emergence of advanced …

Codechameleon: Personalized encryption framework for jailbreaking large language models

H Lv, X Wang, Y Zhang, C Huang, S Dou, J Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
Adversarial misuse, particularly throughjailbreaking'that circumvents a model's safety and
ethical protocols, poses a significant challenge for Large Language Models (LLMs). This …

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

Mission impossible: A statistical perspective on jailbreaking llms

J Su, J Kempe, K Ullrich - Advances in Neural Information …, 2025 - proceedings.neurips.cc
Large language models (LLMs) are trained on a deluge of text data with limited quality
control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as …

Permute-and-flip: An optimally robust and watermarkable decoder for llms

X Zhao, L Li, YX Wang - arxiv preprint arxiv:2402.05864, 2024 - arxiv.org
In this paper, we propose a new decoding method called Permute-and-Flip (PF) decoder. It
enjoys robustness properties similar to the standard sampling decoder, but is provably up to …

Rapid optimization for jailbreaking llms via subconscious exploitation and echopraxia

G Shen, S Cheng, K Zhang, G Tao, S An, L Yan… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have become prevalent across diverse sectors,
transforming human life with their extraordinary reasoning and comprehension abilities. As …

Position: Technical research and talent is needed for effective AI governance

A Reuel, L Soder, B Bucknall… - Forty-first International …, 2024 - openreview.net
In light of recent advancements in AI capabilities and the increasingly widespread
integration of AI systems into society, governments worldwide are actively seeking to …