Opportunities and risks of large language models in psychiatry

N Obradovich, SS Khalsa, WU Khan, J Suh… - … —Digital Psychiatry and …, 2024 - nature.com
The integration of large language models (LLMs) into mental healthcare and research
heralds a potentially transformative shift, one offering enhanced access to care, efficient data …

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu… - arxiv preprint arxiv …, 2024 - arxiv.org
Automated red teaming holds substantial promise for uncovering and mitigating the risks
associated with the malicious use of large language models (LLMs), yet the field lacks a …

Rainbow teaming: Open-ended generation of diverse adversarial prompts

M Samvelyan, SC Raparthy, A Lupu… - Advances in …, 2025 - proceedings.neurips.cc
As large language models (LLMs) become increasingly prevalent across many real-world
applications, understanding and enhancing their robustness to adversarial attacks is of …

Artprompt: Ascii art-based jailbreak attacks against aligned llms

F Jiang, Z Xu, L Niu, Z **ang… - Proceedings of the …, 2024 - aclanthology.org
Safety is critical to the usage of large language models (LLMs). Multiple techniques such as
data filtering and supervised fine-tuning have been developed to strengthen LLM safety …

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - ACM Transactions on Software …, 2024 - dl.acm.org
The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …

Jailbreak attacks and defenses against large language models: A survey

S Yi, Y Liu, Z Sun, T Cong, X He, J Song, K Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have performed exceptionally in various text-generative
tasks, including question answering, translation, code completion, etc. However, the over …

A safe harbor for ai evaluation and red teaming

S Longpre, S Kapoor, K Klyman, A Ramaswami… - arxiv preprint arxiv …, 2024 - arxiv.org
Independent evaluation and red teaming are critical for identifying the risks posed by
generative AI systems. However, the terms of service and enforcement strategies used by …

Llm defenses are not robust to multi-turn human jailbreaks yet

N Li, Z Han, I Steneker, W Primack, R Goodside… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent large language model (LLM) defenses have greatly improved models' ability to
refuse harmful queries, even when adversarially attacked. However, LLM defenses are …