Red-Teaming for generative AI: Silver bullet or security theater?
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Large language model supply chain: A research agenda
The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …
intelligence, introducing unprecedented capabilities in natural language processing and …
Privacy in large language models: Attacks, defenses and future directions
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …
effectively tackle various downstream NLP tasks and unify these tasks into generative …
Harmbench: A standardized evaluation framework for automated red teaming and robust refusal
Automated red teaming holds substantial promise for uncovering and mitigating the risks
associated with the malicious use of large language models (LLMs), yet the field lacks a …
associated with the malicious use of large language models (LLMs), yet the field lacks a …
Jailbreak attacks and defenses against large language models: A survey
Large Language Models (LLMs) have performed exceptionally in various text-generative
tasks, including question answering, translation, code completion, etc. However, the over …
tasks, including question answering, translation, code completion, etc. However, the over …
Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models
This survey summarises the most recent methods for building and assessing helpful, honest,
and harmless neural language models, considering small, medium, and large-size models …
and harmless neural language models, considering small, medium, and large-size models …
Llm defenses are not robust to multi-turn human jailbreaks yet
Recent large language model (LLM) defenses have greatly improved models' ability to
refuse harmful queries, even when adversarially attacked. However, LLM defenses are …
refuse harmful queries, even when adversarially attacked. However, LLM defenses are …
Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models
The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …
Rainbow teaming: Open-ended generation of diverse adversarial prompts
As large language models (LLMs) become increasingly prevalent across many real-world
applications, understanding and enhancing their robustness to user inputs is of paramount …
applications, understanding and enhancing their robustness to user inputs is of paramount …
Self-supervised visual preference alignment
This paper makes the first attempt towards unsupervised preference alignment in Vision-
Language Models (VLMs). We generate chosen and rejected responses with regard to the …
Language Models (VLMs). We generate chosen and rejected responses with regard to the …