- Academic Search

N Obradovich, SS Khalsa, WU Khan, J Suh… - … —Digital Psychiatry and …, 2024 - nature.com

The integration of large language models (LLMs) into mental healthcare and research
heralds a potentially transformative shift, one offering enhanced access to care, efficient data …

Tallenna Viittaa Viittausten määrä 21 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu… - arxiv preprint arxiv …, 2024 - arxiv.org

Automated red teaming holds substantial promise for uncovering and mitigating the risks
associated with the malicious use of large language models (LLMs), yet the field lacks a …

Tallenna Viittaa Viittausten määrä 201 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Rainbow teaming: Open-ended generation of diverse adversarial prompts

M Samvelyan, SC Raparthy, A Lupu… - Advances in …, 2025 - proceedings.neurips.cc

As large language models (LLMs) become increasingly prevalent across many real-world
applications, understanding and enhancing their robustness to adversarial attacks is of …

Tallenna Viittaa Viittausten määrä 50 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Artprompt: Ascii art-based jailbreak attacks against aligned llms

F Jiang, Z Xu, L Niu, Z **ang… - Proceedings of the …, 2024 - aclanthology.org

Safety is critical to the usage of large language models (LLMs). Multiple techniques such as
data filtering and supervised fine-tuning have been developed to strengthen LLM safety …

Tallenna Viittaa Viittausten määrä 76 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Tallenna Viittaa Viittausten määrä 41 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org

The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Tallenna Viittaa Viittausten määrä 58 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - ACM Transactions on Software …, 2024 - dl.acm.org

The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …

Tallenna Viittaa Viittausten määrä 10 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Jailbreak attacks and defenses against large language models: A survey

S Yi, Y Liu, Z Sun, T Cong, X He, J Song, K Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have performed exceptionally in various text-generative
tasks, including question answering, translation, code completion, etc. However, the over …

Tallenna Viittaa Viittausten määrä 46 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A safe harbor for ai evaluation and red teaming

S Longpre, S Kapoor, K Klyman, A Ramaswami… - arxiv preprint arxiv …, 2024 - arxiv.org

Independent evaluation and red teaming are critical for identifying the risks posed by
generative AI systems. However, the terms of service and enforcement strategies used by …

Tallenna Viittaa Viittausten määrä 34 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llm defenses are not robust to multi-turn human jailbreaks yet

N Li, Z Han, I Steneker, W Primack, R Goodside… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent large language model (LLM) defenses have greatly improved models' ability to
refuse harmful queries, even when adversarially attacked. However, LLM defenses are …

Tallenna Viittaa Viittausten määrä 27 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Mart: Improving llm safety with multi-round automatic red-teaming

Opportunities and risks of large language models in psychiatry

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

Rainbow teaming: Open-ended generation of diverse adversarial prompts

Artprompt: Ascii art-based jailbreak attacks against aligned llms

Red-Teaming for generative AI: Silver bullet or security theater?

Privacy in large language models: Attacks, defenses and future directions

Large language model supply chain: A research agenda

Jailbreak attacks and defenses against large language models: A survey

A safe harbor for ai evaluation and red teaming

Llm defenses are not robust to multi-turn human jailbreaks yet