Security and privacy challenges of large language models: A survey

BC Das, MH Amini, Y Wu - ACM Computing Surveys, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …

Metamath: Bootstrap your own mathematical questions for large language models

L Yu, W Jiang, H Shi, J Yu, Z Liu, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have pushed the limits of natural language understanding
and exhibited excellent problem-solving ability. Despite the great success, most existing …

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Jailbreak attacks and defenses against large language models: A survey

S Yi, Y Liu, Z Sun, T Cong, X He, J Song, K Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have performed exceptionally in various text-generative
tasks, including question answering, translation, code completion, etc. However, the over …

A comprehensive study of jailbreak attack versus defense for large language models

Z Xu, Y Liu, G Deng, Y Li, S Picek - Findings of the Association for …, 2024 - aclanthology.org
Abstract Large Language Models (LLMs) have increasingly become central to generating
content with potential societal impacts. Notably, these models have demonstrated …

A causal explainable guardrails for large language models

Z Chu, Y Wang, L Li, Z Wang, Z Qin, K Ren - Proceedings of the 2024 on …, 2024 - dl.acm.org
Large Language Models (LLMs) have shown impressive performance in natural language
tasks, but their outputs can exhibit undesirable attributes or biases. Existing methods for …

Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

H **, L Hu, X Li, P Zhang, C Chen, J Zhuang… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

Safedecoding: Defending against jailbreak attacks via safety-aware decoding

Z Xu, F Jiang, L Niu, J Jia, BY Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
As large language models (LLMs) become increasingly integrated into real-world
applications such as code generation and chatbot assistance, extensive efforts have been …

Safe unlearning: A surprisingly effective and generalizable solution to defend against jailbreak attacks

Z Zhang, J Yang, P Ke, S Cui, C Zheng, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
LLMs are known to be vulnerable to jailbreak attacks, even after safety alignment. An
important observation is that, while different types of jailbreak attacks can generate …