Red-Teaming for generative AI: Silver bullet or security theater?
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models
This survey summarises the most recent methods for building and assessing helpful, honest,
and harmless neural language models, considering small, medium, and large-size models …
and harmless neural language models, considering small, medium, and large-size models …
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …
Defending jailbreak prompts via in-context adversarial game
Large Language Models (LLMs) demonstrate remarkable capabilities across diverse
applications. However, concerns regarding their security, particularly the vulnerability to …
applications. However, concerns regarding their security, particularly the vulnerability to …
The ethical security of large language models: A systematic review
F Liu, J Jiang, Y Lu, Z Huang, J Jiang - Frontiers of Engineering …, 2025 - Springer
The widespread application of large language models (LLMs) has highlighted new security
challenges and ethical concerns, attracting significant academic and societal attention …
challenges and ethical concerns, attracting significant academic and societal attention …
Summon a demon and bind it: A grounded theory of llm red teaming in the wild
Engaging in the deliberate generation of abnormal outputs from large language models
(LLMs) by attacking them is a novel human activity. This paper presents a thorough …
(LLMs) by attacking them is a novel human activity. This paper presents a thorough …
Policy Space Response Oracles: A Survey
In game theory, a game refers to a model of interaction among rational decision-makers or
players, making choices with the goal of achieving their individual objectives. Understanding …
players, making choices with the goal of achieving their individual objectives. Understanding …
From Natural Language to Extensive-Form Game Representations
We introduce a framework for translating game descriptions in natural language into
extensive-form representations in game theory, leveraging Large Language Models (LLMs) …
extensive-form representations in game theory, leveraging Large Language Models (LLMs) …
Towards Scalable Automated Alignment of LLMs: A Survey
Alignment is the most critical step in building large language models (LLMs) that meet
human needs. With the rapid development of LLMs gradually surpassing human …
human needs. With the rapid development of LLMs gradually surpassing human …
Verbalized Bayesian Persuasion
Information design (ID) explores how a sender influence the optimal behavior of receivers to
achieve specific objectives. While ID originates from everyday human communication …
achieve specific objectives. While ID originates from everyday human communication …