- Academic Search

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Enregistrer Citer Cité 40 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] acm.org

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - ACM Transactions on Software …, 2024 - dl.acm.org

The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …

Enregistrer Citer Cité 8 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org

The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Enregistrer Citer Cité 53 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu… - arxiv preprint arxiv …, 2024 - arxiv.org

Automated red teaming holds substantial promise for uncovering and mitigating the risks
associated with the malicious use of large language models (LLMs), yet the field lacks a …

Enregistrer Citer Cité 160 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Jailbreak attacks and defenses against large language models: A survey

S Yi, Y Liu, Z Sun, T Cong, X He, J Song, K Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have performed exceptionally in various text-generative
tasks, including question answering, translation, code completion, etc. However, the over …

Enregistrer Citer Cité 33 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] acm.org

Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models

S Sicari, JF Cevallos M, A Rizzardi… - ACM Computing …, 2024 - dl.acm.org

This survey summarises the most recent methods for building and assessing helpful, honest,
and harmless neural language models, considering small, medium, and large-size models …

Enregistrer Citer Cité 1 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Llm defenses are not robust to multi-turn human jailbreaks yet

N Li, Z Han, I Steneker, W Primack, R Goodside… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent large language model (LLM) defenses have greatly improved models' ability to
refuse harmful queries, even when adversarially attacked. However, LLM defenses are …

Enregistrer Citer Cité 23 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

H **, L Hu, X Li, P Zhang, C Chen, J Zhuang… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …

Enregistrer Citer Cité 20 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Rainbow teaming: Open-ended generation of diverse adversarial prompts

M Samvelyan, SC Raparthy, A Lupu, E Hambro… - arxiv preprint arxiv …, 2024 - arxiv.org

As large language models (LLMs) become increasingly prevalent across many real-world
applications, understanding and enhancing their robustness to user inputs is of paramount …

Enregistrer Citer Cité 37 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Self-supervised visual preference alignment

K Zhu, L Zhao, Z Ge, X Zhang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

This paper makes the first attempt towards unsupervised preference alignment in Vision-
Language Models (VLMs). We generate chosen and rejected responses with regard to the …

Enregistrer Citer Cité 7 fois Autres articles Les 2 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Mart: Improving llm safety with multi-round automatic red-teaming

Red-Teaming for generative AI: Silver bullet or security theater?

Large language model supply chain: A research agenda

Privacy in large language models: Attacks, defenses and future directions

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

Jailbreak attacks and defenses against large language models: A survey

Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models

Llm defenses are not robust to multi-turn human jailbreaks yet

Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

Rainbow teaming: Open-ended generation of diverse adversarial prompts

Self-supervised visual preference alignment