- Academic Search

A Paullada, ID Raji, EM Bender, E Denton, A Hanna - Patterns, 2021 - cell.com

In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …

Save Cite Cited by 662 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Superglue: A stickier benchmark for general-purpose language understanding systems

A Wang, Y Pruksachatkun, N Nangia… - Advances in neural …, 2019 - proceedings.neurips.cc

In the last year, new models and methods for pretraining and transfer learning have driven
striking performance improvements across a range of language understanding tasks. The …

Save Cite Cited by 2435 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aaai.org Full View

From 'F'to 'A'on the NY regents science exams: An overview of the aristo project

P Clark, O Etzioni, T Khot, D Khashabi, B Mishra… - Ai Magazine, 2020 - ojs.aaai.org

AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even
Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge …

Save Cite Cited by 127 Related articles All 12 versions Free GPT-4 Full View View as HTML

[Free GPT-4]

[PDF] arxiv.org

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arxiv preprint arxiv …, 2021 - arxiv.org

We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

Save Cite Cited by 422 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Pretrained transformers improve out-of-distribution robustness

D Hendrycks, X Liu, E Wallace, A Dziedzic… - arxiv preprint arxiv …, 2020 - arxiv.org

Although pretrained Transformers such as BERT achieve high accuracy on in-distribution
examples, do they generalize to new distributions? We systematically measure out-of …

Save Cite Cited by 482 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

HateCheck: Functional tests for hate speech detection models

P Röttger, B Vidgen, D Nguyen, Z Waseem… - arxiv preprint arxiv …, 2020 - arxiv.org

Detecting online hate is a difficult task that even state-of-the-art models struggle with.
Typically, hate speech detection models are evaluated by measuring their performance on …

Save Cite Cited by 271 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Certified robustness to adversarial word substitutions

R Jia, A Raghunathan, K Göksel, P Liang - arxiv preprint arxiv …, 2019 - arxiv.org

State-of-the-art NLP models can often be fooled by adversaries that apply seemingly
innocuous label-preserving transformations (eg, paraphrasing) to input text. The number of …

Save Cite Cited by 340 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

MRQA 2019 shared task: Evaluating generalization in reading comprehension

A Fisch, A Talmor, R Jia, M Seo, E Choi… - arxiv preprint arxiv …, 2019 - arxiv.org

We present the results of the Machine Reading for Question Answering (MRQA) 2019
shared task on evaluating the generalization capabilities of reading comprehension …

Save Cite Cited by 306 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Measure and improve robustness in NLP models: A survey

X Wang, H Wang, D Yang - arxiv preprint arxiv:2112.08313, 2021 - arxiv.org

As NLP models achieved state-of-the-art performances over benchmarks and gained wide
applications, it has been increasingly important to ensure the safe deployment of these …

Save Cite Cited by 130 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mit.edu

An empirical study on robustness to spurious correlations using pre-trained language models

L Tu, G Lalwani, S Gella, H He - Transactions of the Association for …, 2020 - direct.mit.edu

Recent work has shown that pre-trained language models such as BERT improve
robustness to spurious correlations in the dataset. Intrigued by these results, we find that the …

Save Cite Cited by 190 Related articles All 13 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Inoculation by fine-tuning: A method for analyzing challenge datasets

Data and its (dis) contents: A survey of dataset development and use in machine learning research

Superglue: A stickier benchmark for general-purpose language understanding systems

From 'F'to 'A'on the NY regents science exams: An overview of the aristo project

Dynabench: Rethinking benchmarking in NLP

Pretrained transformers improve out-of-distribution robustness

HateCheck: Functional tests for hate speech detection models

Certified robustness to adversarial word substitutions

MRQA 2019 shared task: Evaluating generalization in reading comprehension

Measure and improve robustness in NLP models: A survey

An empirical study on robustness to spurious correlations using pre-trained language models