Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

R Kamoi, Y Zhang, N Zhang, J Han… - Transactions of the …, 2024 - direct.mit.edu
Self-correction is an approach to improving responses from large language models (LLMs)
by refining the responses using LLMs during inference. Prior work has proposed various self …

Gpt-4 technical report

J Achiam, S Adler, S Agarwal, L Ahmad… - arxiv preprint arxiv …, 2023 - arxiv.org
We report the development of GPT-4, a large-scale, multimodal model which can accept
image and text inputs and produce text outputs. While less capable than humans in many …

Beavertails: Towards improved safety alignment of llm via a human-preference dataset

J Ji, M Liu, J Dai, X Pan, C Zhang… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety
alignment in large language models (LLMs). This dataset uniquely separates annotations of …

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

Safe rlhf: Safe reinforcement learning from human feedback

J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
With the development of large language models (LLMs), striking a balance between the
performance and safety of AI systems has never been more critical. However, the inherent …

Can llm-generated misinformation be detected?

C Chen, K Shu - arxiv preprint arxiv:2309.13788, 2023 - arxiv.org
The advent of Large Language Models (LLMs) has made a transformative impact. However,
the potential that LLMs such as ChatGPT can be exploited to generate misinformation has …

Auditing large language models: a three-layered approach

J Mökander, J Schuett, HR Kirk, L Floridi - AI and Ethics, 2024 - Springer
Large language models (LLMs) represent a major advance in artificial intelligence (AI)
research. However, the widespread use of LLMs is also coupled with significant ethical and …

The capacity for moral self-correction in large language models

D Ganguli, A Askell, N Schiefer, TI Liao… - arxiv preprint arxiv …, 2023 - arxiv.org
We test the hypothesis that language models trained with reinforcement learning from
human feedback (RLHF) have the capability to" morally self-correct"--to avoid producing …

Evaluating the social impact of generative ai systems in systems and society

I Solaiman, Z Talat, W Agnew, L Ahmad… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative AI systems across modalities, ranging from text (including code), image, audio,
and video, have broad social impacts, but there is no official standard for means of …