Natural language reasoning, a survey

F Yu, H Zhang, P Tiwari, B Wang - ACM Computing Surveys, 2024 - dl.acm.org
This survey article proposes a clearer view of Natural Language Reasoning (NLR) in the
field of Natural Language Processing (NLP), both conceptually and practically …

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models

S Feng, CY Park, Y Liu, Y Tsvetkov - arxiv preprint arxiv:2305.08283, 2023 - arxiv.org
Language models (LMs) are pretrained on diverse data sources, including news, discussion
forums, books, and online encyclopedias. A significant portion of this data includes opinions …

Evaluating the moral beliefs encoded in llms

N Scherrer, C Shi, A Feder… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper presents a case study on the design, administration, post-processing, and
evaluation of surveys on large language models (LLMs). It comprises two components:(1) A …

Refiner: Reasoning feedback on intermediate representations

D Paul, M Ismayilzada, M Peyrard, B Borges… - arxiv preprint arxiv …, 2023 - arxiv.org
Language models (LMs) have recently shown remarkable performance on reasoning tasks
by explicitly generating intermediate inferences, eg, chain-of-thought prompting. However …

When to make exceptions: Exploring language models as accounts of human moral judgment

Z **, S Levine, F Gonzalez Adauto… - Advances in neural …, 2022 - proceedings.neurips.cc
AI systems are becoming increasingly intertwined with human life. In order to effectively
collaborate with humans and ensure safety, AI systems need to be able to understand …

Moca: Measuring human-language model alignment on causal and moral judgment tasks

A Nie, Y Zhang, AS Amdekar, C Piech… - Advances in …, 2023 - proceedings.neurips.cc
Human commonsense understanding of the physical and social world is organized around
intuitive theories. These theories support making causal and moral judgments. When …

Safetybench: Evaluating the safety of large language models with multiple choice questions

Z Zhang, L Lei, L Wu, R Sun, Y Huang, C Long… - arxiv preprint arxiv …, 2023 - arxiv.org
With the rapid development of Large Language Models (LLMs), increasing attention has
been paid to their safety concerns. Consequently, evaluating the safety of LLMs has become …

The moral integrity corpus: A benchmark for ethical dialogue systems

C Ziems, JA Yu, YC Wang, A Halevy… - arxiv preprint arxiv …, 2022 - arxiv.org
Conversational agents have come increasingly closer to human competence in open-
domain dialogue settings; however, such models can reflect insensitive, hurtful, or entirely …