Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

Evaluating the social impact of generative ai systems in systems and society

I Solaiman, Z Talat, W Agnew, L Ahmad… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative AI systems across modalities, ranging from text, image, audio, and video, have
broad social impacts, but there exists no official standard for means of evaluating those …

Facet: Fairness in computer vision evaluation benchmark

L Gustafson, C Rolland, N Ravi… - Proceedings of the …, 2023 - openaccess.thecvf.com
Computer vision models have known performance disparities across attributes such as
gender and skin tone. This means during tasks such as classification and detection, model …

AI's regimes of representation: A community-centered study of text-to-image models in South Asia

R Qadri, R Shelby, CL Bennett, E Denton - Proceedings of the 2023 …, 2023 - dl.acm.org
This paper presents a community-centered study of cultural limitations of text-to-image (T2I)
models in the South Asian context. We theorize these failures using scholarship on …

Designing responsible ai: Adaptations of ux practice to meet responsible ai challenges

Q Wang, M Madaio, S Kane, S Kapania… - Proceedings of the …, 2023 - dl.acm.org
Technology companies continue to invest in efforts to incorporate responsibility in their
Artificial Intelligence (AI) advancements, while efforts to audit and regulate AI systems …

Hate speech classifiers learn normative social stereotypes

AM Davani, M Atari, B Kennedy… - Transactions of the …, 2023 - direct.mit.edu
Social stereotypes negatively impact individuals' judgments about different groups and may
have a critical role in understanding language directed toward marginalized groups. Here …

Representation in AI evaluations

AS Bergman, LA Hendricks, M Rauh, B Wu… - Proceedings of the …, 2023 - dl.acm.org
Calls for representation in artificial intelligence (AI) and machine learning (ML) are
widespread, with" representation" or" representativeness" generally understood to be both …

Eliciting and learning with soft labels from every annotator

KM Collins, U Bhatt, A Weller - Proceedings of the AAAI conference on …, 2022 - ojs.aaai.org
The labels used to train machine learning (ML) models are of paramount importance.
Typically for ML classification tasks, datasets contain hard labels, yet learning using soft …

" I wouldn't say offensive but...": Disability-Centered Perspectives on Large Language Models

V Gadiraju, S Kane, S Dev, A Taylor, D Wang… - Proceedings of the …, 2023 - dl.acm.org
Large language models (LLMs) trained on real-world data can inadvertently reflect harmful
societal biases, particularly toward historically marginalized communities. While previous …