Security and privacy challenges of large language models: A survey
Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …
contributed to multiple fields, such as generating and summarizing text, language …
The llama 3 herd of models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …
presents a new set of foundation models, called Llama 3. It is a herd of language models …
Trustllm: Trustworthiness in large language models
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …
attention for their excellent natural language processing capabilities. Nonetheless, these …
Jailbreak and guard aligned language models with only few in-context demonstrations
Large Language Models (LLMs) have shown remarkable success in various tasks, yet their
safety and the risk of generating harmful content remain pressing concerns. In this paper, we …
safety and the risk of generating harmful content remain pressing concerns. In this paper, we …
[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …
natural language processing capabilities. Nonetheless, these LLMs present many …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
Smoothllm: Defending large language models against jailbreaking attacks
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …
Images are achilles' heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models
In this paper, we study the harmlessness alignment problem of multimodal large language
models (MLLMs). We conduct a systematic empirical analysis of the harmlessness …
models (MLLMs). We conduct a systematic empirical analysis of the harmlessness …
On protecting the data privacy of large language models (llms): A survey
Large language models (LLMs) are complex artificial intelligence systems capable of
understanding, generating and translating human language. They learn language patterns …
understanding, generating and translating human language. They learn language patterns …
Refusal in language models is mediated by a single direction
Conversational large language models are fine-tuned for both instruction-following and
safety, resulting in models that obey benign requests but refuse harmful ones. While this …
safety, resulting in models that obey benign requests but refuse harmful ones. While this …