Role play with large language models

M Shanahan, K McDonell, L Reynolds - Nature, 2023 - nature.com
As dialogue agents become increasingly human-like in their performance, we must develop
effective ways to describe their behaviour in high-level terms without falling into the trap of …

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Large language model as attributed training data generator: A tale of diversity and bias

Y Yu, Y Zhuang, J Zhang, Y Meng… - Advances in …, 2024 - proceedings.neurips.cc
Large language models (LLMs) have been recently leveraged as training data generators
for various natural language processing (NLP) tasks. While previous research has explored …

[HTML][HTML] A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics

K He, R Mao, Q Lin, Y Ruan, X Lan, M Feng… - Information …, 2025 - Elsevier
The utilization of large language models (LLMs) for Healthcare has generated both
excitement and concern due to their ability to effectively respond to free-text queries with …

Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arxiv preprint arxiv …, 2024 - arxiv.org
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

Safe rlhf: Safe reinforcement learning from human feedback

J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
With the development of large language models (LLMs), striking a balance between the
performance and safety of AI systems has never been more critical. However, the inherent …

[PDF][PDF] Managing ai risks in an era of rapid progress

Y Bengio, G Hinton, A Yao, D Song… - arxiv preprint arxiv …, 2023 - blog.biocomm.ai
In this short consensus paper, we outline risks from upcoming, advanced AI systems. We
examine large-scale social harms and malicious uses, as well as an irreversible loss of …

Openchat: Advancing open-source language models with mixed-quality data

G Wang, S Cheng, X Zhan, X Li, S Song… - arxiv preprint arxiv …, 2023 - arxiv.org
Nowadays, open-source large language models like LLaMA have emerged. Recent
developments have incorporated supervised fine-tuning (SFT) and reinforcement learning …

Bridging the gap: A survey on integrating (human) feedback for natural language generation

P Fernandes, A Madaan, E Liu, A Farinhas… - Transactions of the …, 2023 - direct.mit.edu
Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …

Scalable watermarking for identifying large language model outputs

S Dathathri, A See, S Ghaisas, PS Huang, R McAdam… - Nature, 2024 - nature.com
Large language models (LLMs) have enabled the generation of high-quality synthetic text,
often indistinguishable from human-written content, at a scale that can markedly affect the …