Teach me to explain: A review of datasets for explainable natural language processing

S Wiegreffe, A Marasović - arxiv preprint arxiv:2102.12060, 2021 - arxiv.org
Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated textual
explanations. These explanations are used downstream in three ways: as data …

The belebele benchmark: a parallel reading comprehension dataset in 122 language variants

L Bandarkar, D Liang, B Muller, M Artetxe… - arxiv preprint arxiv …, 2023 - arxiv.org
We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset
spanning 122 language variants. Significantly expanding the language coverage of natural …

What will it take to fix benchmarking in natural language understanding?

SR Bowman, GE Dahl - arxiv preprint arxiv:2104.02145, 2021 - arxiv.org
Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and
biased systems score so highly on standard benchmarks that there is little room for …

WANLI: Worker and AI collaboration for natural language inference dataset creation

A Liu, S Swayamdipta, NA Smith, Y Choi - arxiv preprint arxiv:2201.05955, 2022 - arxiv.org
A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …

IMPLI: Investigating NLI models' performance on figurative language

K Stowe, P Utama, I Gurevych - … of the 60th Annual Meeting of the …, 2022 - aclanthology.org
Natural language inference (NLI) has been widely used as a task to train and evaluate
models for language understanding. However, the ability of NLI models to perform …

Issues with entailment-based zero-shot text classification

T Ma, JG Yao, CY Lin, T Zhao - … of the 59th Annual Meeting of the …, 2021 - aclanthology.org
The general format of natural language inference (NLI) makes it tempting to be used for zero-
shot text classification by casting any target label into a sentence of hypothesis and verifying …

Models in the loop: Aiding crowdworkers with generative annotation assistants

M Bartolo, T Thrush, S Riedel, P Stenetorp… - arxiv preprint arxiv …, 2021 - arxiv.org
In Dynamic Adversarial Data Collection (DADC), human annotators are tasked with finding
examples that models struggle to predict correctly. Models trained on DADC-collected …

Analyzing dynamic adversarial training data in the limit

E Wallace, A Williams, R Jia, D Kiela - arxiv preprint arxiv:2110.08514, 2021 - arxiv.org
To create models that are robust across a wide range of test inputs, training datasets should
include diverse examples that span numerous phenomena. Dynamic adversarial data …

Fool me twice: Entailment from wikipedia gamification

JM Eisenschlos, B Dhingra, J Bulian… - arxiv preprint arxiv …, 2021 - arxiv.org
We release FoolMeTwice (FM2 for short), a large dataset of challenging entailment pairs
collected through a fun multi-player game. Gamification encourages adversarial examples …

Farstail: A persian natural language inference dataset

H Amirkhani, M AzariJafari, S Faridan-Jahromi… - Soft Computing, 2023 - Springer
With the considerable achievements of data-hungry deep learning methods in natural
language processing tasks, a great amount of effort has been devoted to develop more …