Natural language reasoning, a survey

F Yu, H Zhang, P Tiwari, B Wang - ACM Computing Surveys, 2024 - dl.acm.org
This survey article proposes a clearer view of Natural Language Reasoning (NLR) in the
field of Natural Language Processing (NLP), both conceptually and practically …

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Evaluating the moral beliefs encoded in llms

N Scherrer, C Shi, A Feder… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper presents a case study on the design, administration, post-processing, and
evaluation of surveys on large language models (LLMs). It comprises two components:(1) A …

Large pre-trained language models contain human-like biases of what is right and wrong to do

P Schramowski, C Turan, N Andersen… - Nature Machine …, 2022 - nature.com
Artificial writing is permeating our lives due to recent advances in large-scale, transformer-
based language models (LMs) such as BERT, GPT-2 and GPT-3. Using them as pre-trained …

When to make exceptions: Exploring language models as accounts of human moral judgment

Z **, S Levine, F Gonzalez Adauto… - Advances in neural …, 2022 - proceedings.neurips.cc
AI systems are becoming increasingly intertwined with human life. In order to effectively
collaborate with humans and ensure safety, AI systems need to be able to understand …

Latent hatred: A benchmark for understanding implicit hate speech

M ElSherief, C Ziems, D Muchlinski, V Anupindi… - arxiv preprint arxiv …, 2021 - arxiv.org
Hate speech has grown significantly on social media, causing serious consequences for
victims of all demographics. Despite much attention being paid to characterize and detect …

NLPositionality: Characterizing design biases of datasets and models

S Santy, JT Liang, RL Bras, K Reinecke… - arxiv preprint arxiv …, 2023 - arxiv.org
Design biases in NLP systems, such as performance differences for different populations,
often stem from their creator's positionality, ie, views and lived experiences shaped by …