- Academic Search

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Save Cite Cited by 2148 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts

J Yu, X Lin, Z Yu, X **ng - arxiv preprint arxiv:2309.10253, 2023 - arxiv.org

Large language models (LLMs) have recently experienced tremendous popularity and are
widely used from casual conversations to AI-driven programming. However, despite their …

Save Cite Cited by 234 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Fine-tuning aligned language models compromises safety, even when users do not intend to!

X Qi, Y Zeng, T **e, PY Chen, R Jia, P Mittal… - arxiv preprint arxiv …, 2023 - arxiv.org

Optimizing large language models (LLMs) for downstream use cases often involves the
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …

Save Cite Cited by 400 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Lmsys-chat-1m: A large-scale real-world llm conversation dataset

L Zheng, WL Chiang, Y Sheng, T Li, S Zhuang… - arxiv preprint arxiv …, 2023 - arxiv.org

Studying how people interact with large language models (LLMs) in real-world scenarios is
increasingly important due to their widespread use in various applications. In this paper, we …

Save Cite Cited by 104 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aaai.org

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Save Cite Cited by 37 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Defending against alignment-breaking attacks via robustly aligned llm

B Cao, Y Cao, L Lin, J Chen - arxiv preprint arxiv:2309.14348, 2023 - arxiv.org

Recently, Large Language Models (LLMs) have made significant advancements and are
now widely used across various domains. Unfortunately, there has been a rising concern …

Save Cite Cited by 107 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Introducing v0. 5 of the ai safety benchmark from mlcommons

B Vidgen, A Agrawal, AM Ahmed, V Akinwande… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces v0. 5 of the AI Safety Benchmark, which has been created by the
MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to …

Save Cite Cited by 30 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

Improving alignment and robustness with circuit breakers

A Zou, L Phan, J Wang, D Duenas, M Lin… - The Thirty-eighth …, 2024 - openreview.net

AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We
present an approach, inspired by recent advances in representation engineering, that …

Save Cite Cited by 39 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Sorry-bench: Systematically evaluating large language model safety refusal behaviors

T **e, X Qi, Y Zeng, Y Huang, UM Sehwag… - arxiv preprint arxiv …, 2024 - arxiv.org

Evaluating aligned large language models'(LLMs) ability to recognize and reject unsafe user
requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts …

Save Cite Cited by 26 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Chatgpt's one-year anniversary: are open-source large language models catching up?

H Chen, F Jiao, X Li, C Qin, M Ravaut, R Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org

Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of
AI, both in research and commerce. Through instruction-tuning a large language model …

Save Cite Cited by 47 Related articles All 4 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Xstest: A test suite for identifying exaggerated safety behaviours in large language models

The llama 3 herd of models

Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts

Fine-tuning aligned language models compromises safety, even when users do not intend to!

Lmsys-chat-1m: A large-scale real-world llm conversation dataset

Red-Teaming for generative AI: Silver bullet or security theater?

Defending against alignment-breaking attacks via robustly aligned llm

Introducing v0. 5 of the ai safety benchmark from mlcommons

Improving alignment and robustness with circuit breakers

Sorry-bench: Systematically evaluating large language model safety refusal behaviors

Chatgpt's one-year anniversary: are open-source large language models catching up?