Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

Holistic evaluation of text-to-image models

T Lee, M Yasunaga, C Meng, Y Mai… - Advances in …, 2023 - proceedings.neurips.cc
The stunning qualitative improvement of text-to-image models has led to their widespread
attention and adoption. However, we lack a comprehensive quantitative understanding of …

Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering

Y Hu, B Liu, J Kasai, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite thousands of researchers, engineers, and artists actively working on improving text-
to-image generation models, systems often fail to produce images that accurately align with …

Holistic evaluation of language models

R Bommasani, P Liang, T Lee - … of the New York Academy of …, 2023 - Wiley Online Library
Abstract Language models (LMs) like GPT‐3, PaLM, and ChatGPT are the foundation for
almost all major language technologies, but their capabilities, limitations, and risks are not …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

A roadmap to pluralistic alignment

T Sorensen, J Moore, J Fisher, M Gordon… - arxiv preprint arxiv …, 2024 - arxiv.org
With increased power and prevalence of AI systems, it is ever more critical that AI systems
are designed to serve all, ie, people with diverse values and perspectives. However …

The foundation model transparency index

R Bommasani, K Klyman, S Longpre, S Kapoor… - arxiv preprint arxiv …, 2023 - arxiv.org
Foundation models have rapidly permeated society, catalyzing a wave of generative AI
applications spanning enterprise and consumer-facing contexts. While the societal impact of …

The values encoded in machine learning research

A Birhane, P Kalluri, D Card, W Agnew… - Proceedings of the …, 2022 - dl.acm.org
Machine learning currently exerts an outsized influence on the world, increasingly affecting
institutional practices and impacted communities. It is therefore critical that we question …