Google Academic

A Paullada, ID Raji, EM Bender, E Denton, A Hanna - Patterns, 2021 - cell.com

In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …

Salvați Citați Citat de 661 ori Articole cu conținut similar Toate cele 13 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org Full View

From 'F'to 'A'on the NY regents science exams: An overview of the aristo project

P Clark, O Etzioni, T Khot, D Khashabi, B Mishra… - Ai Magazine, 2020 - ojs.aaai.org

AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even
Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge …

Salvați Citați Citat de 128 ori Articole cu conținut similar Toate cele 12 versiuni Full View Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arxiv preprint arxiv …, 2021 - arxiv.org

We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

Salvați Citați Citat de 433 ori Articole cu conținut similar Toate cele 8 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pretrained transformers improve out-of-distribution robustness

D Hendrycks, X Liu, E Wallace, A Dziedzic… - arxiv preprint arxiv …, 2020 - arxiv.org

Although pretrained Transformers such as BERT achieve high accuracy on in-distribution
examples, do they generalize to new distributions? We systematically measure out-of …

Salvați Citați Citat de 489 ori Articole cu conținut similar Toate cele 11 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Superglue: A stickier benchmark for general-purpose language understanding systems

A Wang, Y Pruksachatkun, N Nangia… - Advances in neural …, 2019 - proceedings.neurips.cc

In the last year, new models and methods for pretraining and transfer learning have driven
striking performance improvements across a range of language understanding tasks. The …

Salvați Citați Citat de 2503 ori Articole cu conținut similar Toate cele 10 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

HateCheck: Functional tests for hate speech detection models

P Röttger, B Vidgen, D Nguyen, Z Waseem… - arxiv preprint arxiv …, 2020 - arxiv.org

Detecting online hate is a difficult task that even state-of-the-art models struggle with.
Typically, hate speech detection models are evaluated by measuring their performance on …

Salvați Citați Citat de 273 ori Articole cu conținut similar Toate cele 8 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Certified robustness to adversarial word substitutions

R Jia, A Raghunathan, K Göksel, P Liang - arxiv preprint arxiv …, 2019 - arxiv.org

State-of-the-art NLP models can often be fooled by adversaries that apply seemingly
innocuous label-preserving transformations (eg, paraphrasing) to input text. The number of …

Salvați Citați Citat de 342 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Measure and improve robustness in NLP models: A survey

X Wang, H Wang, D Yang - arxiv preprint arxiv:2112.08313, 2021 - arxiv.org

As NLP models achieved state-of-the-art performances over benchmarks and gained wide
applications, it has been increasingly important to ensure the safe deployment of these …

Salvați Citați Citat de 131 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MRQA 2019 shared task: Evaluating generalization in reading comprehension

A Fisch, A Talmor, R Jia, M Seo, E Choi… - arxiv preprint arxiv …, 2019 - arxiv.org

We present the results of the Machine Reading for Question Answering (MRQA) 2019
shared task on evaluating the generalization capabilities of reading comprehension …

Salvați Citați Citat de 308 ori Articole cu conținut similar Toate cele 10 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How can we accelerate progress towards human-like linguistic generalization?

T Linzen - arxiv preprint arxiv:2005.00955, 2020 - arxiv.org

This position paper describes and critiques the Pretraining-Agnostic Identically Distributed
(PAID) evaluation paradigm, which has become a central tool for measuring progress in …

Salvați Citați Citat de 212 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Inoculation by fine-tuning: A method for analyzing challenge datasets

Data and its (dis) contents: A survey of dataset development and use in machine learning research

From 'F'to 'A'on the NY regents science exams: An overview of the aristo project

Dynabench: Rethinking benchmarking in NLP

Pretrained transformers improve out-of-distribution robustness

Superglue: A stickier benchmark for general-purpose language understanding systems

HateCheck: Functional tests for hate speech detection models

Certified robustness to adversarial word substitutions

Measure and improve robustness in NLP models: A survey

MRQA 2019 shared task: Evaluating generalization in reading comprehension

How can we accelerate progress towards human-like linguistic generalization?