Red teaming language models with language models
Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
We describe our early efforts to red team language models in order to simultaneously
discover, measure, and attempt to reduce their potentially harmful outputs. We make three …
discover, measure, and attempt to reduce their potentially harmful outputs. We make three …
On evaluating adversarial robustness of large vision-language models
Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented
performance in response generation, especially with visual inputs, enabling more creative …
performance in response generation, especially with visual inputs, enabling more creative …
Jailbreaking black box large language models in twenty queries
There is growing interest in ensuring that large language models (LLMs) align with human
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …
Explore, establish, exploit: Red teaming language models from scratch
Deploying Large language models (LLMs) can pose hazards from harmful outputs such as
toxic or dishonest speech. Prior work has introduced tools that elicit harmful outputs in order …
toxic or dishonest speech. Prior work has introduced tools that elicit harmful outputs in order …
Wanli: Worker and ai collaboration for natural language inference dataset creation
A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …
[PDF][PDF] Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora
Children can acquire language from less than 100 million words of input. Large language
models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data …
models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data …
State-of-the-art generalisation research in NLP: a taxonomy and review
The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …
Can llms augment low-resource reading comprehension datasets? opportunities and challenges
Large Language Models (LLMs) have demonstrated impressive zero shot performance on a
wide range of NLP tasks, demonstrating the ability to reason and apply commonsense. A …
wide range of NLP tasks, demonstrating the ability to reason and apply commonsense. A …
On robustness of prompt-based semantic parsing with large pre-trained language model: An empirical study on codex
Semantic parsing is a technique aimed at constructing a structured representation of the
meaning of a natural-language question. Recent advancements in few-shot language …
meaning of a natural-language question. Recent advancements in few-shot language …