AI pitfalls and what not to do: mitigating bias in AI

JW Gichoya, K Thomas, LA Celi… - The British Journal of …, 2023 - academic.oup.com
Various forms of artificial intelligence (AI) applications are being deployed and used in many
healthcare systems. As the use of these applications increases, we are learning the failures …

Data and its (dis) contents: A survey of dataset development and use in machine learning research

A Paullada, ID Raji, EM Bender, E Denton, A Hanna - Patterns, 2021 - cell.com
In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …

Pervasive label errors in test sets destabilize machine learning benchmarks

CG Northcutt, A Athalye, J Mueller - arxiv preprint arxiv:2103.14749, 2021 - arxiv.org
We identify label errors in the test sets of 10 of the most commonly-used computer vision,
natural language, and audio datasets, and subsequently study the potential for these label …

Dos and don'ts of machine learning in computer security

D Arp, E Quiring, F Pendlebury, A Warnecke… - 31st USENIX Security …, 2022 - usenix.org
With the growing processing power of computing systems and the increasing availability of
massive datasets, machine learning algorithms have led to major breakthroughs in many …

Large image datasets: A pyrrhic win for computer vision?

A Birhane, VU Prabhu - 2021 IEEE Winter Conference on …, 2021 - ieeexplore.ieee.org
In this paper we investigate problematic practices and consequences of large scale vision
datasets (LSVDs). We examine broad issues such as the question of consent and justice as …

Are we done with imagenet?

L Beyer, OJ Hénaff, A Kolesnikov, X Zhai… - arxiv preprint arxiv …, 2020 - arxiv.org
Yes, and no. We ask whether recent progress on the ImageNet classification benchmark
continues to represent meaningful generalization, or whether the community has started to …

Reduced, reused and recycled: The life of a dataset in machine learning research

B Koch, E Denton, A Hanna, JG Foster - arxiv preprint arxiv:2112.01716, 2021 - arxiv.org
Benchmark datasets play a central role in the organization of machine learning research.
They coordinate researchers around shared research problems and serve as a measure of …

Hyperbolic contrastive learning for visual representations beyond objects

S Ge, S Mishra, S Kornblith, CL Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Although self-/un-supervised methods have led to rapid progress in visual representation
learning, these methods generally treat objects and scenes using the same lens. In this …

Breeds: Benchmarks for subpopulation shift

S Santurkar, D Tsipras, A Madry - arxiv preprint arxiv:2008.04859, 2020 - arxiv.org
We develop a methodology for assessing the robustness of models to subpopulation shift---
specifically, their ability to generalize to novel data subpopulations that were not observed …

Re-labeling imagenet: from single to multi-labels, from global to localized labels

S Yun, SJ Oh, B Heo, D Han… - Proceedings of the …, 2021 - openaccess.thecvf.com
ImageNet has been the most popular image classification benchmark, but it is also the one
with a significant level of label noise. Recent studies have shown that many samples contain …