- Academic Search

K Ji, J He, Q Gu - arxiv preprint arxiv:2402.09401, 2024 - arxiv.org

Aligning large language models (LLM) with human preference plays a key role in building
modern generative models and can be achieved by reinforcement learning from human …

Speichern Zitieren Zitiert von: 17 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Borda regret minimization for generalized linear dueling bandits

Y Wu, T **, H Lou, F Farnoud, Q Gu - arxiv preprint arxiv:2303.08816, 2023 - arxiv.org

Dueling bandits are widely used to model preferential feedback prevalent in many
applications such as recommendation systems and ranking. In this paper, we study the …

Speichern Zitieren Zitiert von: 9 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sparse pairwise re-ranking with pre-trained transformers

L Gienapp, M Fröbe, M Hagen, M Potthast - Proceedings of the 2022 …, 2022 - dl.acm.org

Pairwise re-ranking models predict which of two documents is more relevant to a query and
then aggregate a final ranking from such preferences. This is often more effective than …

Speichern Zitieren Zitiert von: 11 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Active ranking without strong stochastic transitivity

H Lou, T **, Y Wu, P Xu, Q Gu… - Advances in neural …, 2022 - proceedings.neurips.cc

Ranking from noisy comparisons is of great practical interest in machine learning. In this
paper, we consider the problem of recovering the exact full ranking for a list of items under …

Speichern Zitieren Zitiert von: 8 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ranking a Set of Objects using Heterogeneous Workers: QUITE an Easy Problem

A Nordio, E Leonardi - arxiv preprint arxiv:2310.02016, 2023 - arxiv.org

We focus on the problem of ranking $ N $ objects starting from a set of noisy pairwise
comparisons provided by a crowd of unequal workers, each worker being characterized by a …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] escholarship.org

Learning from Human Feedback: Ranking, Bandit, and Preference Optimization

Y Wu - 2024 - search.proquest.com

This dissertation investigates several challenges in artificial intelligence (AI) alignment and
reinforcement learning (RL), particularly focusing on applications when only preference …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Adaptive sampling for heterogeneous rank aggregation from noisy pairwise comparisons

Reinforcement learning from human feedback with active queries

Borda regret minimization for generalized linear dueling bandits

Sparse pairwise re-ranking with pre-trained transformers

Active ranking without strong stochastic transitivity

Ranking a Set of Objects using Heterogeneous Workers: QUITE an Easy Problem

Learning from Human Feedback: Ranking, Bandit, and Preference Optimization