- Academic Search

E Perez, S Huang, F Song, T Cai, R Ring… - arxiv preprint arxiv …, 2022 - arxiv.org

Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …

Save Cite Cited by 591 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned

D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai… - arxiv preprint arxiv …, 2022 - arxiv.org

We describe our early efforts to red team language models in order to simultaneously
discover, measure, and attempt to reduce their potentially harmful outputs. We make three …

Save Cite Cited by 465 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

On evaluating adversarial robustness of large vision-language models

Y Zhao, T Pang, C Du, X Yang, C Li… - Advances in …, 2024 - proceedings.neurips.cc

Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented
performance in response generation, especially with visual inputs, enabling more creative …

Save Cite Cited by 176 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Jailbreaking black box large language models in twenty queries

P Chao, A Robey, E Dobriban, H Hassani… - arxiv preprint arxiv …, 2023 - arxiv.org

There is growing interest in ensuring that large language models (LLMs) align with human
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …

Save Cite Cited by 428 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Explore, establish, exploit: Red teaming language models from scratch

S Casper, J Lin, J Kwon, G Culp… - arxiv preprint arxiv …, 2023 - arxiv.org

Deploying Large language models (LLMs) can pose hazards from harmful outputs such as
toxic or dishonest speech. Prior work has introduced tools that elicit harmful outputs in order …

Save Cite Cited by 84 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Wanli: Worker and ai collaboration for natural language inference dataset creation

A Liu, S Swayamdipta, NA Smith, Y Choi - arxiv preprint arxiv:2201.05955, 2022 - arxiv.org

A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …

Save Cite Cited by 219 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] ethz.ch

[PDF][PDF] Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora

A Warstadt, A Mueller, L Choshen… - … of the BabyLM …, 2023 - research-collection.ethz.ch

Children can acquire language from less than 100 million words of input. Large language
models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data …

Save Cite Cited by 98 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

State-of-the-art generalisation research in NLP: a taxonomy and review

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - arxiv preprint arxiv …, 2022 - arxiv.org

The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …

Save Cite Cited by 61 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Can llms augment low-resource reading comprehension datasets? opportunities and challenges

V Samuel, H Aynaou, AG Chowdhury… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated impressive zero shot performance on a
wide range of NLP tasks, demonstrating the ability to reason and apply commonsense. A …

Save Cite Cited by 11 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

On robustness of prompt-based semantic parsing with large pre-trained language model: An empirical study on codex

TY Zhuo, Z Li, Y Huang, F Shiri, W Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Semantic parsing is a technique aimed at constructing a structured representation of the
meaning of a natural-language question. Recent advancements in few-shot language …

Save Cite Cited by 59 Related articles All 7 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Improving question answering model robustness with synthetic adversarial data generation

Red teaming language models with language models

Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned

On evaluating adversarial robustness of large vision-language models

Jailbreaking black box large language models in twenty queries

Explore, establish, exploit: Red teaming language models from scratch

Wanli: Worker and ai collaboration for natural language inference dataset creation

[PDF][PDF] Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora

State-of-the-art generalisation research in NLP: a taxonomy and review

Can llms augment low-resource reading comprehension datasets? opportunities and challenges

On robustness of prompt-based semantic parsing with large pre-trained language model: An empirical study on codex