Power to the people? Opportunities and challenges for participatory AI

A Birhane, W Isaac, V Prabhakaran, M Diaz… - Proceedings of the 2nd …, 2022 - dl.acm.org
Participatory approaches to artificial intelligence (AI) and machine learning (ML) are gaining
momentum: the increased attention comes partly with the view that participation opens the …

Data and its (dis) contents: A survey of dataset development and use in machine learning research

A Paullada, ID Raji, EM Bender, E Denton, A Hanna - Patterns, 2021 - cell.com
In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …

The value of standards for health datasets in artificial intelligence-based applications

A Arora, JE Alderman, J Palmer, S Ganapathi… - Nature medicine, 2023 - nature.com
Artificial intelligence as a medical device is increasingly being applied to healthcare for
diagnosis, risk stratification and resource allocation. However, a growing body of evidence …

Taxonomy of risks posed by language models

L Weidinger, J Uesato, M Rauh, C Griffin… - Proceedings of the …, 2022 - dl.acm.org
Responsible innovation on large-scale Language Models (LMs) requires foresight into and
in-depth understanding of the risks these models may pose. This paper develops a …

The participatory turn in ai design: Theoretical foundations and the current state of practice

F Delgado, S Yang, M Madaio, Q Yang - … of the 3rd ACM Conference on …, 2023 - dl.acm.org
Despite the growing consensus that stakeholders affected by AI systems should participate
in their design, enormous variation and implicit disagreements exist among current …

Realtoxicityprompts: Evaluating neural toxic degeneration in language models

S Gehman, S Gururangan, M Sap, Y Choi… - arxiv preprint arxiv …, 2020 - arxiv.org
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise
toxic language which hinders their safe deployment. We investigate the extent to which …

Dataperf: Benchmarks for data-centric ai development

M Mazumder, C Banbury, X Yao… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Machine learning research has long focused on models rather than datasets, and
prominent datasets are used for common ML tasks without regard to the breadth, difficulty …

Predictability and surprise in large generative models

D Ganguli, D Hernandez, L Lovitt, A Askell… - Proceedings of the …, 2022 - dl.acm.org
Large-scale pre-training has recently emerged as a technique for creating capable, general-
purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many …

Do datasets have politics? Disciplinary values in computer vision dataset development

MK Scheuerman, A Hanna, E Denton - … of the ACM on Human-Computer …, 2021 - dl.acm.org
Data is a crucial component of machine learning. The field is reliant on data to train, validate,
and test models. With increased technical capabilities, machine learning research has …

On the genealogy of machine learning datasets: A critical history of ImageNet

E Denton, A Hanna, R Amironesei, A Smart… - Big Data & …, 2021 - journals.sagepub.com
In response to growing concerns of bias, discrimination, and unfairness perpetuated by
algorithmic systems, the datasets used to train and evaluate machine learning models have …