Inference-time intervention: Eliciting truthful answers from a language model

K Li, O Patel, F Viégas, H Pfister… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract We introduce Inference-Time Intervention (ITI), a technique designed to enhance
the" truthfulness" of large language models (LLMs). ITI operates by shifting model activations …

Self-play fine-tuning converts weak language models to strong language models

Z Chen, Y Deng, H Yuan, K Ji, Q Gu - arxiv preprint arxiv:2401.01335, 2024 - arxiv.org
Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …

Camel: Communicative agents for" mind" exploration of large language model society

G Li, H Hammoud, H Itani… - Advances in Neural …, 2023 - proceedings.neurips.cc
The rapid advancement of chat-based language models has led to remarkable progress in
complex task-solving. However, their success heavily relies on human input to guide the …

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Language models can solve computer tasks

G Kim, P Baldi, S McAleer - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Agents capable of carrying out general tasks on a computer can improve efficiency and
productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally …