The llama 3 herd of models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …
presents a new set of foundation models, called Llama 3. It is a herd of language models …
Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts
Large language models (LLMs) have recently experienced tremendous popularity and are
widely used from casual conversations to AI-driven programming. However, despite their …
widely used from casual conversations to AI-driven programming. However, despite their …
Fine-tuning aligned language models compromises safety, even when users do not intend to!
Optimizing large language models (LLMs) for downstream use cases often involves the
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …
Lmsys-chat-1m: A large-scale real-world llm conversation dataset
Studying how people interact with large language models (LLMs) in real-world scenarios is
increasingly important due to their widespread use in various applications. In this paper, we …
increasingly important due to their widespread use in various applications. In this paper, we …
Red-Teaming for generative AI: Silver bullet or security theater?
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Defending against alignment-breaking attacks via robustly aligned llm
Recently, Large Language Models (LLMs) have made significant advancements and are
now widely used across various domains. Unfortunately, there has been a rising concern …
now widely used across various domains. Unfortunately, there has been a rising concern …
Introducing v0. 5 of the ai safety benchmark from mlcommons
This paper introduces v0. 5 of the AI Safety Benchmark, which has been created by the
MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to …
MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to …
Improving alignment and robustness with circuit breakers
AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We
present an approach, inspired by recent advances in representation engineering, that …
present an approach, inspired by recent advances in representation engineering, that …
Sorry-bench: Systematically evaluating large language model safety refusal behaviors
Evaluating aligned large language models'(LLMs) ability to recognize and reject unsafe user
requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts …
requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts …
Chatgpt's one-year anniversary: are open-source large language models catching up?
Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of
AI, both in research and commerce. Through instruction-tuning a large language model …
AI, both in research and commerce. Through instruction-tuning a large language model …