We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic …
Adversarial attacks for discrete data (such as texts) have been proved significantly more challenging than continuous data (such as images) since it is difficult to generate adversarial …
We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this …
Recent work by Zellers et al.(2018) introduced a new task of commonsense natural language inference: given an event description such as" A woman sits at a piano," a …
In this paper we study yes/no questions that are naturally occurring---meaning that they are generated in unprompted and unconstrained settings. We build a reading comprehension …