- Academic Search

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Save Cite Cited by 41 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models

S Sicari, JF Cevallos M, A Rizzardi… - ACM Computing …, 2024 - dl.acm.org

This survey summarises the most recent methods for building and assessing helpful, honest,
and harmless neural language models, considering small, medium, and large-size models …

Save Cite Cited by 1 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] jair.org

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org

Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

Save Cite Cited by 12 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Defending jailbreak prompts via in-context adversarial game

Y Zhou, Y Han, H Zhuang, K Guo, Z Liang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) demonstrate remarkable capabilities across diverse
applications. However, concerns regarding their security, particularly the vulnerability to …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 View as HTML

The ethical security of large language models: A systematic review

F Liu, J Jiang, Y Lu, Z Huang, J Jiang - Frontiers of Engineering …, 2025 - Springer

The widespread application of large language models (LLMs) has highlighted new security
challenges and ethical concerns, attracting significant academic and societal attention …

Save Cite Related articles

[Free GPT-4]

[PDF] arxiv.org

Summon a demon and bind it: A grounded theory of llm red teaming in the wild

N Inie, J Stray, L Derczynski - arxiv preprint arxiv:2311.06237, 2023 - arxiv.org

Engaging in the deliberate generation of abnormal outputs from large language models
(LLMs) by attacking them is a novel human activity. This paper presents a thorough …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Policy Space Response Oracles: A Survey

A Bighashdel, Y Wang, S McAleer, R Savani… - arxiv preprint arxiv …, 2024 - arxiv.org

In game theory, a game refers to a model of interaction among rational decision-makers or
players, making choices with the goal of achieving their individual objectives. Understanding …

Save Cite Cited by 11 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

From Natural Language to Extensive-Form Game Representations

S Deng, Y Wang, R Savani - arxiv preprint arxiv:2501.17282, 2025 - arxiv.org

We introduce a framework for translating game descriptions in natural language into
extensive-form representations in game theory, leveraging Large Language Models (LLMs) …

[Free GPT-4]

[PDF] arxiv.org

Towards Scalable Automated Alignment of LLMs: A Survey

B Cao, K Lu, X Lu, J Chen, M Ren, H **ang… - arxiv preprint arxiv …, 2024 - arxiv.org

Alignment is the most critical step in building large language models (LLMs) that meet
human needs. With the rapid development of LLMs gradually surpassing human …

Save Cite Cited by 10 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Verbalized Bayesian Persuasion

W Li, Y Lin, X Wang, B **, H Zha, B Wang - arxiv preprint arxiv …, 2025 - arxiv.org

Information design (ID) explores how a sender influence the optimal behavior of receivers to
achieve specific objectives. While ID originates from everyday human communication …

Create alert

Cite

Advanced search

Saved to My library

Red teaming game: A game-theoretic framework for red teaming language models

Red-Teaming for generative AI: Silver bullet or security theater?

Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

Defending jailbreak prompts via in-context adversarial game

The ethical security of large language models: A systematic review

Summon a demon and bind it: A grounded theory of llm red teaming in the wild

Policy Space Response Oracles: A Survey

From Natural Language to Extensive-Form Game Representations

Towards Scalable Automated Alignment of LLMs: A Survey

Verbalized Bayesian Persuasion