Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
Segment anything
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …
image segmentation. Using our efficient model in a data collection loop, we built the largest …
Evaluating the social impact of generative ai systems in systems and society
Generative AI systems across modalities, ranging from text, image, audio, and video, have
broad social impacts, but there exists no official standard for means of evaluating those …
broad social impacts, but there exists no official standard for means of evaluating those …
Facet: Fairness in computer vision evaluation benchmark
Computer vision models have known performance disparities across attributes such as
gender and skin tone. This means during tasks such as classification and detection, model …
gender and skin tone. This means during tasks such as classification and detection, model …
AI's regimes of representation: A community-centered study of text-to-image models in South Asia
This paper presents a community-centered study of cultural limitations of text-to-image (T2I)
models in the South Asian context. We theorize these failures using scholarship on …
models in the South Asian context. We theorize these failures using scholarship on …
Designing responsible ai: Adaptations of ux practice to meet responsible ai challenges
Technology companies continue to invest in efforts to incorporate responsibility in their
Artificial Intelligence (AI) advancements, while efforts to audit and regulate AI systems …
Artificial Intelligence (AI) advancements, while efforts to audit and regulate AI systems …
Hate speech classifiers learn normative social stereotypes
Social stereotypes negatively impact individuals' judgments about different groups and may
have a critical role in understanding language directed toward marginalized groups. Here …
have a critical role in understanding language directed toward marginalized groups. Here …
Representation in AI evaluations
Calls for representation in artificial intelligence (AI) and machine learning (ML) are
widespread, with" representation" or" representativeness" generally understood to be both …
widespread, with" representation" or" representativeness" generally understood to be both …
Eliciting and learning with soft labels from every annotator
The labels used to train machine learning (ML) models are of paramount importance.
Typically for ML classification tasks, datasets contain hard labels, yet learning using soft …
Typically for ML classification tasks, datasets contain hard labels, yet learning using soft …
" I wouldn't say offensive but...": Disability-Centered Perspectives on Large Language Models
Large language models (LLMs) trained on real-world data can inadvertently reflect harmful
societal biases, particularly toward historically marginalized communities. While previous …
societal biases, particularly toward historically marginalized communities. While previous …