- Academic Search

Y Meng, M **a, D Chen - Advances in Neural Information …, 2025 - proceedings.neurips.cc

Abstract Direct Preference Optimization (DPO) is a widely used offline preference
optimization algorithm that reparameterizes reward functions in reinforcement learning from …

Lagre Referanse Sitert av 224 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large language models are effective text rankers with pairwise ranking prompting

Z Qin, R Jagerman, K Hui, H Zhuang, J Wu… - arxiv preprint arxiv …, 2023 - arxiv.org

Ranking documents using Large Language Models (LLMs) by directly feeding the query and
candidate documents into the prompt is an interesting and practical problem. However …

Lagre Referanse Sitert av 201 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Direct nash optimization: Teaching language models to self-improve with general preferences

C Rosset, CA Cheng, A Mitra, M Santacroce… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper studies post-training large language models (LLMs) using preference feedback
from a powerful oracle to help a model iteratively improve over itself. The typical approach …

Lagre Referanse Sitert av 83 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llm comparator: Visual analytics for side-by-side evaluation of large language models

M Kahng, I Tenney, M Pushkarna, MX Liu… - Extended Abstracts of …, 2024 - dl.acm.org

Automatic side-by-side evaluation has emerged as a promising approach to evaluating the
quality of responses from large language models (LLMs). However, analyzing the results …

Lagre Referanse Sitert av 25 Beslektede artikler Alle 8 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Building math agents with multi-turn iterative preference learning

W **ong, C Shi, J Shen, A Rosenberg, Z Qin… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …

Lagre Referanse Sitert av 13 Beslektede artikler Alle 2 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on human preference learning for large language models

R Jiang, K Chen, X Bai, Z He, J Li, M Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent surge of versatile large language models (LLMs) largely depends on aligning
increasingly capable foundation models with human intentions by preference learning …

Lagre Referanse Sitert av 13 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards a unified view of preference learning for large language models: A survey

B Gao, F Song, Y Miao, Z Cai, Z Yang, L Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial
factors to achieve success is aligning the LLM's output with human preferences. This …

Lagre Referanse Sitert av 6 Beslektede artikler Alle 2 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prompt optimization with human feedback

X Lin, Z Dai, A Verma, SK Ng, P Jaillet… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have demonstrated remarkable performances in various
tasks. However, the performance of LLMs heavily depends on the input prompt, which has …

Lagre Referanse Sitert av 9 Beslektede artikler Alle 6 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Alignment of diffusion models: Fundamentals, challenges, and future

B Liu, S Shao, B Li, L Bai, Z Xu, H **ong, J Kwok… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion models have emerged as the leading paradigm in generative modeling, excelling
in various applications. Despite their success, these models often misalign with human …

Lagre Referanse Sitert av 7 Beslektede artikler Alle 4 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Filtered direct preference optimization

T Morimura, M Sakamoto, Y **nai, K Abe… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning
language models with human preferences. While the significance of dataset quality is …

Lagre Referanse Sitert av 12 Beslektede artikler Alle 5 versjoner HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

Lipo: Listwise preference optimization through learning-to-rank

Simpo: Simple preference optimization with a reference-free reward

Large language models are effective text rankers with pairwise ranking prompting

Direct nash optimization: Teaching language models to self-improve with general preferences

Llm comparator: Visual analytics for side-by-side evaluation of large language models

Building math agents with multi-turn iterative preference learning

A survey on human preference learning for large language models

Towards a unified view of preference learning for large language models: A survey

Prompt optimization with human feedback

Alignment of diffusion models: Fundamentals, challenges, and future

Filtered direct preference optimization