Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

A survey on fairness in large language models

Y Li, M Du, R Song, X Wang, Y Wang - arxiv preprint arxiv:2308.10149, 2023 - arxiv.org
Large language models (LLMs) have shown powerful performance and development
prospect and are widely deployed in the real world. However, LLMs can capture social …

Gpt-4 technical report

J Achiam, S Adler, S Agarwal, L Ahmad… - arxiv preprint arxiv …, 2023 - arxiv.org
We report the development of GPT-4, a large-scale, multimodal model which can accept
image and text inputs and produce text outputs. While less capable than humans in many …

Five sources of bias in natural language processing

D Hovy, S Prabhumoye - Language and linguistics compass, 2021 - Wiley Online Library
Recently, there has been an increased interest in demographically grounded bias in natural
language processing (NLP) applications. Much of the recent work has focused on describing …

Language (technology) is power: A critical survey of" bias" in nlp

SL Blodgett, S Barocas, H Daumé III… - arxiv preprint arxiv …, 2020 - arxiv.org
We survey 146 papers analyzing" bias" in NLP systems, finding that their motivations are
often vague, inconsistent, and lacking in normative reasoning, despite the fact that …

BBQ: A hand-built bias benchmark for question answering

A Parrish, A Chen, N Nangia, V Padmakumar… - arxiv preprint arxiv …, 2021 - arxiv.org
It is well documented that NLP models learn social biases, but little work has been done on
how these biases manifest in model outputs for applied tasks like question answering (QA) …

Bold: Dataset and metrics for measuring biases in open-ended language generation

J Dhamala, T Sun, V Kumar, S Krishna… - Proceedings of the …, 2021 - dl.acm.org
Recent advances in deep learning techniques have enabled machines to generate
cohesive open-ended text when prompted with a sequence of words as context. While these …

Superglue: A stickier benchmark for general-purpose language understanding systems

A Wang, Y Pruksachatkun, N Nangia… - Advances in neural …, 2019 - proceedings.neurips.cc
In the last year, new models and methods for pretraining and transfer learning have driven
striking performance improvements across a range of language understanding tasks. The …

CrowS-pairs: A challenge dataset for measuring social biases in masked language models

N Nangia, C Vania, R Bhalerao… - arxiv preprint arxiv …, 2020 - arxiv.org
Pretrained language models, especially masked language models (MLMs) have seen
success across many NLP tasks. However, there is ample evidence that they use the cultural …

BERT for coreference resolution: Baselines and analysis

M Joshi, O Levy, DS Weld, L Zettlemoyer - arxiv preprint arxiv:1908.09091, 2019 - arxiv.org
We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes
(+ 3.9 F1) and GAP (+ 11.5 F1) benchmarks. A qualitative analysis of model predictions …