- Academic Search

A Paullada, ID Raji, EM Bender, E Denton, A Hanna - Patterns, 2021 - cell.com

In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …

Save Cite Cited by 664 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org

Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Save Cite Cited by 232 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] mit.edu

A primer in BERTology: What we know about how BERT works

A Rogers, O Kovaleva, A Rumshisky - Transactions of the Association …, 2021 - direct.mit.edu

Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …

Save Cite Cited by 1826 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arxiv preprint arxiv …, 2021 - arxiv.org

We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

Save Cite Cited by 425 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

AI and the everything in the whole wide world benchmark

ID Raji, EM Bender, A Paullada, E Denton… - arxiv preprint arxiv …, 2021 - arxiv.org

There is a tendency across different subfields in AI to valorize a small collection of influential
benchmarks. These benchmarks operate as stand-ins for a range of anointed common …

Save Cite Cited by 315 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Question and answer test-train overlap in open-domain question answering datasets

P Lewis, P Stenetorp, S Riedel - arxiv preprint arxiv:2008.02637, 2020 - arxiv.org

Ideally Open-Domain Question Answering models should exhibit a number of
competencies, ranging from simply memorizing questions seen at training time, to …

Save Cite Cited by 206 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

What will it take to fix benchmarking in natural language understanding?

SR Bowman, GE Dahl - arxiv preprint arxiv:2104.02145, 2021 - arxiv.org

Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and
biased systems score so highly on standard benchmarks that there is little room for …

Save Cite Cited by 166 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Wanli: Worker and ai collaboration for natural language inference dataset creation

A Liu, S Swayamdipta, NA Smith, Y Choi - arxiv preprint arxiv:2201.05955, 2022 - arxiv.org

A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …

Save Cite Cited by 219 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Out of order: How important is the sequential order of words in a sentence in natural language understanding tasks?

TM Pham, T Bui, L Mai, A Nguyen - arxiv preprint arxiv:2012.15180, 2020 - arxiv.org

Do state-of-the-art natural language understanding models care about word order-one of the
most important characteristics of a sequence? Not always! We found 75% to 90% of the …

Save Cite Cited by 136 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mit.edu

Beat the AI: Investigating adversarial human annotation for reading comprehension

M Bartolo, A Roberts, J Welbl, S Riedel… - Transactions of the …, 2020 - direct.mit.edu

Innovations in annotation methodology have been a catalyst for Reading Comprehension
(RC) datasets and models. One recent trend to challenge current RC models is to involve a …

Save Cite Cited by 184 Related articles All 10 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Assessing the benchmarking capacity of machine reading comprehension datasets

Data and its (dis) contents: A survey of dataset development and use in machine learning research

Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A primer in BERTology: What we know about how BERT works

Dynabench: Rethinking benchmarking in NLP

AI and the everything in the whole wide world benchmark

Question and answer test-train overlap in open-domain question answering datasets

What will it take to fix benchmarking in natural language understanding?

Wanli: Worker and ai collaboration for natural language inference dataset creation

Out of order: How important is the sequential order of words in a sentence in natural language understanding tasks?

Beat the AI: Investigating adversarial human annotation for reading comprehension