Dynabench: Rethinking benchmarking in NLP
We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
Machine learning testing: Survey, landscapes and horizons
This paper provides a comprehensive survey of techniques for testing machine learning
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …
An empirical study on robustness to spurious correlations using pre-trained language models
Recent work has shown that pre-trained language models such as BERT improve
robustness to spurious correlations in the dataset. Intrigued by these results, we find that the …
robustness to spurious correlations in the dataset. Intrigued by these results, we find that the …
Robustness gym: Unifying the NLP evaluation landscape
Despite impressive performance on standard benchmarks, deep neural networks are often
brittle when deployed in real-world systems. Consequently, recent research has focused on …
brittle when deployed in real-world systems. Consequently, recent research has focused on …
Towards debiasing NLU models from unknown biases
NLU models often exploit biases to achieve high dataset-specific performance without
properly learning the intended task. Recently proposed debiasing methods are shown to be …
properly learning the intended task. Recently proposed debiasing methods are shown to be …
A fine-grained comparison of pragmatic language understanding in humans and language models
Pragmatics and non-literal language understanding are essential to human communication,
and present a long-standing challenge for artificial language models. We perform a fine …
and present a long-standing challenge for artificial language models. We perform a fine …
Quality assurance strategies for machine learning applications in big data analytics: an overview
M Ogrizović, D Drašković, D Bojić - Journal of Big Data, 2024 - Springer
Abstract Machine learning (ML) models have gained significant attention in a variety of
applications, from computer vision to natural language processing, and are almost always …
applications, from computer vision to natural language processing, and are almost always …
DISCO: Distilling counterfactuals with large language models
Models trained with counterfactually augmented data learn representations of the causal
structure of tasks, enabling robust generalization. However, high-quality counterfactual data …
structure of tasks, enabling robust generalization. However, high-quality counterfactual data …
Are natural language inference models IMPPRESsive? Learning IMPlicature and PRESupposition
Natural language inference (NLI) is an increasingly important task for natural language
understanding, which requires one to infer whether a sentence entails another. However …
understanding, which requires one to infer whether a sentence entails another. However …
Text-crs: A generalized certified robustness framework against textual adversarial attacks
The language models, especially the basic text classification models, have been shown to
be susceptible to textual adversarial attacks such as synonym substitution and word …
be susceptible to textual adversarial attacks such as synonym substitution and word …