Security and privacy challenges of large language models: A survey

BC Das, MH Amini, Y Wu - ACM Computing Surveys, 2025‏ - dl.acm.org
Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Jailbroken: How does llm safety training fail?

A Wei, N Haghtalab… - Advances in Neural …, 2023‏ - proceedings.neurips.cc
Large language models trained for safety and harmlessness remain susceptible to
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases …

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models

X Shen, Z Chen, M Backes, Y Shen… - Proceedings of the 2024 on …, 2024‏ - dl.acm.org
The misuse of large language models (LLMs) has drawn significant attention from the
general public and LLM vendors. One particular type of adversarial prompt, known as …

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023‏ - blogs.qub.ac.uk
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Multi-step jailbreaking privacy attacks on chatgpt

H Li, D Guo, W Fan, M Xu, J Huang, F Meng… - arxiv preprint arxiv …, 2023‏ - arxiv.org
With the rapid progress of large language models (LLMs), many downstream NLP tasks can
be well solved given appropriate prompts. Though model developers and researchers work …

Propile: Probing privacy leakage in large language models

S Kim, S Yun, H Lee, M Gubri… - Advances in Neural …, 2023‏ - proceedings.neurips.cc
The rapid advancement and widespread use of large language models (LLMs) have raised
significant concerns regarding the potential leakage of personally identifiable information …

Survey of vulnerabilities in large language models revealed by adversarial attacks

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

Deepinception: Hypnotize large language model to be jailbreaker

X Li, Z Zhou, J Zhu, J Yao, T Liu, B Han - arxiv preprint arxiv:2311.03191, 2023‏ - arxiv.org
Despite remarkable success in various applications, large language models (LLMs) are
vulnerable to adversarial jailbreaks that make the safety guardrails void. However, previous …

Aya model: An instruction finetuned open-access multilingual language model

A Üstün, V Aryabumi, ZX Yong, WY Ko… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Recent breakthroughs in large language models (LLMs) have centered around a handful of
data-rich languages. What does it take to broaden access to breakthroughs beyond first …