- Academic Search

BC Das, MH Amini, Y Wu - ACM Computing Surveys, 2024 - dl.acm.org

Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …

Enregistrer Citer Cité 96 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Metamath: Bootstrap your own mathematical questions for large language models

L Yu, W Jiang, H Shi, J Yu, Z Liu, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have pushed the limits of natural language understanding
and exhibited excellent problem-solving ability. Despite the great success, most existing …

Enregistrer Citer Cité 425 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] aaai.org

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Enregistrer Citer Cité 41 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Jailbreak attacks and defenses against large language models: A survey

S Yi, Y Liu, Z Sun, T Cong, X He, J Song, K Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have performed exceptionally in various text-generative
tasks, including question answering, translation, code completion, etc. However, the over …

Enregistrer Citer Cité 35 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] aclanthology.org

A comprehensive study of jailbreak attack versus defense for large language models

Z Xu, Y Liu, G Deng, Y Li, S Picek - Findings of the Association for …, 2024 - aclanthology.org

Abstract Large Language Models (LLMs) have increasingly become central to generating
content with potential societal impacts. Notably, these models have demonstrated …

Enregistrer Citer Cité 33 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

A causal explainable guardrails for large language models

Z Chu, Y Wang, L Li, Z Wang, Z Qin, K Ren - Proceedings of the 2024 on …, 2024 - dl.acm.org

Large Language Models (LLMs) have shown impressive performance in natural language
tasks, but their outputs can exhibit undesirable attributes or biases. Existing methods for …

Enregistrer Citer Cité 11 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

H **, L Hu, X Li, P Zhang, C Chen, J Zhuang… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …

Enregistrer Citer Cité 21 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] jair.org

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org

Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

Enregistrer Citer Cité 12 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Safedecoding: Defending against jailbreak attacks via safety-aware decoding

Z Xu, F Jiang, L Niu, J Jia, BY Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

As large language models (LLMs) become increasingly integrated into real-world
applications such as code generation and chatbot assistance, extensive efforts have been …

Enregistrer Citer Cité 74 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Safe unlearning: A surprisingly effective and generalizable solution to defend against jailbreak attacks

Z Zhang, J Yang, P Ke, S Cui, C Zheng, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

LLMs are known to be vulnerable to jailbreak attacks, even after safety alignment. An
important observation is that, while different types of jailbreak attacks can generate …

Enregistrer Citer Cité 15 fois Autres articles Les 4 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Deepinception: Hypnotize large language model to be jailbreaker

Security and privacy challenges of large language models: A survey

Metamath: Bootstrap your own mathematical questions for large language models

Red-Teaming for generative AI: Silver bullet or security theater?

Jailbreak attacks and defenses against large language models: A survey

A comprehensive study of jailbreak attack versus defense for large language models

A causal explainable guardrails for large language models

Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

Safedecoding: Defending against jailbreak attacks via safety-aware decoding

Safe unlearning: A surprisingly effective and generalizable solution to defend against jailbreak attacks