- Academic Search

C Burns, P Izmailov, JH Kirchner, B Baker… - arxiv preprint arxiv …, 2023 - arxiv.org

Widely used alignment techniques, such as reinforcement learning from human feedback
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …

Save Cite Cited by 207 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

From generation to judgment: Opportunities and challenges of llm-as-a-judge

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org

Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

Save Cite Cited by 11 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Direct nash optimization: Teaching language models to self-improve with general preferences

C Rosset, CA Cheng, A Mitra, M Santacroce… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper studies post-training large language models (LLMs) using preference feedback
from a powerful oracle to help a model iteratively improve over itself. The typical approach …

Save Cite Cited by 75 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Self-exploring language models: Active preference elicitation for online alignment

S Zhang, D Yu, H Sharma, H Zhong, Z Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …

Save Cite Cited by 20 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Building math agents with multi-turn iterative preference learning

W **ong, C Shi, J Shen, A Rosenberg, Z Qin… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …

Save Cite Cited by 11 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

A survey on data synthesis and augmentation for large language models

K Wang, J Zhu, M Ren, Z Liu, S Li, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

The success of Large Language Models (LLMs) is inherently linked to the availability of vast,
diverse, and high-quality data for training and evaluation. However, the growth rate of high …

Save Cite Cited by 3 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Rlhf workflow: From reward modeling to online rlhf

H Dong, W **ong, B Pang, H Wang, H Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

We present the workflow of Online Iterative Reinforcement Learning from Human Feedback
(RLHF) in this technical report, which is widely reported to outperform its offline counterpart …

Save Cite Cited by 66 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Towards a unified view of preference learning for large language models: A survey

B Gao, F Song, Y Miao, Z Cai, Z Yang, L Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial
factors to achieve success is aligning the LLM's output with human preferences. This …

Save Cite Cited by 6 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Filtered direct preference optimization

T Morimura, M Sakamoto, Y **nai, K Abe… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning
language models with human preferences. While the significance of dataset quality is …

Save Cite Cited by 12 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Systematic evaluation of llm-as-a-judge in llm alignment tasks: Explainable metrics and diverse prompt templates

H Wei, S He, T **a, A Wong, J Lin, M Han - arxiv preprint arxiv …, 2024 - arxiv.org

Alignment approaches such as RLHF and DPO are actively investigated to align large
language models (LLMs) with human preferences. Commercial large language models …

Save Cite Cited by 9 Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Direct language model alignment from online ai feedback

Weak-to-strong generalization: Eliciting strong capabilities with weak supervision

From generation to judgment: Opportunities and challenges of llm-as-a-judge

Direct nash optimization: Teaching language models to self-improve with general preferences

Self-exploring language models: Active preference elicitation for online alignment

Building math agents with multi-turn iterative preference learning

A survey on data synthesis and augmentation for large language models

Rlhf workflow: From reward modeling to online rlhf

Towards a unified view of preference learning for large language models: A survey

Filtered direct preference optimization

Systematic evaluation of llm-as-a-judge in llm alignment tasks: Explainable metrics and diverse prompt templates