Knowledge editing for large language models: A survey
Large Language Models (LLMs) have recently transformed both the academic and industrial
landscapes due to their remarkable capacity to understand, analyze, and generate texts …
landscapes due to their remarkable capacity to understand, analyze, and generate texts …
[HTML][HTML] Decoding ChatGPT: a taxonomy of existing research, current challenges, and possible future directions
Abstract Chat Generative Pre-trained Transformer (ChatGPT) has gained significant interest
and attention since its launch in November 2022. It has shown impressive performance in …
and attention since its launch in November 2022. It has shown impressive performance in …
Alpacafarm: A simulation framework for methods that learn from human feedback
Large language models (LLMs) such as ChatGPT have seen widespread adoption due to
their ability to follow user instructions well. Develo** these LLMs involves a complex yet …
their ability to follow user instructions well. Develo** these LLMs involves a complex yet …
Using large language models to simulate multiple humans and replicate human subject studies
We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what
extent a given language model, such as GPT models, can simulate different aspects of …
extent a given language model, such as GPT models, can simulate different aspects of …
Open problems and fundamental limitations of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …
to align with human goals. RLHF has emerged as the central method used to finetune state …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
Evaluating verifiability in generative search engines
Generative search engines directly generate responses to user queries, along with in-line
citations. A prerequisite trait of a trustworthy generative search engine is verifiability, ie …
citations. A prerequisite trait of a trustworthy generative search engine is verifiability, ie …
Towards understanding sycophancy in language models
Human feedback is commonly utilized to finetune AI assistants. But human feedback may
also encourage model responses that match user beliefs over truthful ones, a behaviour …
also encourage model responses that match user beliefs over truthful ones, a behaviour …
Evaluating the moral beliefs encoded in llms
This paper presents a case study on the design, administration, post-processing, and
evaluation of surveys on large language models (LLMs). It comprises two components:(1) A …
evaluation of surveys on large language models (LLMs). It comprises two components:(1) A …
Diffusion model alignment using direct preference optimization
Large language models (LLMs) are fine-tuned using human comparison data with
Reinforcement Learning from Human Feedback (RLHF) methods to make them better …
Reinforcement Learning from Human Feedback (RLHF) methods to make them better …