Weak-to-strong generalization: Eliciting strong capabilities with weak supervision
Widely used alignment techniques, such as reinforcement learning from human feedback
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …
From generation to judgment: Opportunities and challenges of llm-as-a-judge
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …
and natural language processing (NLP). However, traditional methods, whether matching …
Direct nash optimization: Teaching language models to self-improve with general preferences
This paper studies post-training large language models (LLMs) using preference feedback
from a powerful oracle to help a model iteratively improve over itself. The typical approach …
from a powerful oracle to help a model iteratively improve over itself. The typical approach …
Self-exploring language models: Active preference elicitation for online alignment
Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …
Feedback (RLHF), has achieved significant success in aligning Large Language Models …
Building math agents with multi-turn iterative preference learning
Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …
solving capabilities can be enhanced by integrating external tools, such as code …
A survey on data synthesis and augmentation for large language models
K Wang, J Zhu, M Ren, Z Liu, S Li, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
The success of Large Language Models (LLMs) is inherently linked to the availability of vast,
diverse, and high-quality data for training and evaluation. However, the growth rate of high …
diverse, and high-quality data for training and evaluation. However, the growth rate of high …
Rlhf workflow: From reward modeling to online rlhf
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback
(RLHF) in this technical report, which is widely reported to outperform its offline counterpart …
(RLHF) in this technical report, which is widely reported to outperform its offline counterpart …
Towards a unified view of preference learning for large language models: A survey
Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial
factors to achieve success is aligning the LLM's output with human preferences. This …
factors to achieve success is aligning the LLM's output with human preferences. This …
Filtered direct preference optimization
Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning
language models with human preferences. While the significance of dataset quality is …
language models with human preferences. While the significance of dataset quality is …
Systematic evaluation of llm-as-a-judge in llm alignment tasks: Explainable metrics and diverse prompt templates
Alignment approaches such as RLHF and DPO are actively investigated to align large
language models (LLMs) with human preferences. Commercial large language models …
language models (LLMs) with human preferences. Commercial large language models …