Security and privacy challenges of large language models: A survey

BC Das, MH Amini, Y Wu - ACM Computing Surveys, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Jailbreak and guard aligned language models with only few in-context demonstrations

Z Wei, Y Wang, A Li, Y Mo, Y Wang - arxiv preprint arxiv:2310.06387, 2023 - arxiv.org
Large Language Models (LLMs) have shown remarkable success in various tasks, yet their
safety and the risk of generating harmful content remain pressing concerns. In this paper, we …

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arxiv preprint arxiv …, 2023 - arxiv.org
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …

Images are achilles' heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models

Y Li, H Guo, K Zhou, WX Zhao, JR Wen - European Conference on …, 2024 - Springer
In this paper, we study the harmlessness alignment problem of multimodal large language
models (MLLMs). We conduct a systematic empirical analysis of the harmlessness …

On protecting the data privacy of large language models (llms): A survey

B Yan, K Li, M Xu, Y Dong, Y Zhang, Z Ren… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are complex artificial intelligence systems capable of
understanding, generating and translating human language. They learn language patterns …

Refusal in language models is mediated by a single direction

A Arditi, O Obeso, A Syed, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
Conversational large language models are fine-tuned for both instruction-following and
safety, resulting in models that obey benign requests but refuse harmful ones. While this …