- Academic Search

S Poddar, Y Wan, H Ivison, A Gupta… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning
foundation models to human values and preferences. However, current RLHF techniques …

Enregistrer Citer Cité 10 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?

Y **nai - arxiv preprint arxiv:2406.16316, 2024 - arxiv.org

Alignment of the language model with human preferences is a common approach to making
a language model useful to end users. However, most alignment work is done in English …

Enregistrer Citer Cité 4 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

ProgressGym: Alignment with a Millennium of Moral Progress

T Qiu, Y Zhang, X Huang, JX Li, J Ji, Y Yang - arxiv preprint arxiv …, 2024 - arxiv.org

Frontier AI systems, including large language models (LLMs), hold increasing influence over
the epistemology of human users. Such influence can reinforce prevailing societal values …

Enregistrer Citer Cité 1 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Online Learning from Strategic Human Feedback in LLM Fine-Tuning

S Hao, L Duan - arxiv preprint arxiv:2412.16834, 2024 - arxiv.org

Reinforcement learning from human feedback (RLHF) has become an essential step in fine-
tuning large language models (LLMs) to align them with human preferences. However …

Enregistrer Citer Cité 1 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

S Verma, N Boehmer, L Kong, M Tambe - arxiv preprint arxiv:2408.12112, 2024 - arxiv.org

LLMs are increasingly used to design reward functions based on human preferences in
Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed …

Enregistrer Citer Cité 1 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Representative Social Choice: From Learning Theory to AI Alignment

T Qiu - arxiv preprint arxiv:2410.23953, 2024 - arxiv.org

Social choice theory is the study of preference aggregation across a population, used both
in mechanism design for human agents and in the democratic alignment of language …

Enregistrer Citer Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas

N Balepur, V Padmakumar, F Yang, S Feng… - arxiv preprint arxiv …, 2025 - arxiv.org

LLMs are tuned to follow instructions (aligned) by learning which of two outputs users prefer
for a prompt. However, this preference data format does not convey why users prefer …

Enregistrer Citer Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

Direct Preference Optimization With Unobserved Preference Heterogeneity

K Chidambaram, KV Seetharaman… - arxiv preprint arxiv …, 2024 - arxiv.org

RLHF has emerged as a pivotal step in aligning language models with human objectives
and values. It typically involves learning a reward model from human preference data and …

Enregistrer Citer Cité 3 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Can LLM be a Personalized Judge?

YR Dong, T Hu, N Collier - arxiv preprint arxiv:2406.11657, 2024 - arxiv.org

Ensuring that large language models (LLMs) reflect diverse user values and preferences is
crucial as their user bases expand globally. It is therefore encouraging to see the growing …

Enregistrer Citer Cité 14 fois Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

False consensus biases AI against vulnerable stakeholders

M Dong, JF Bonnefon, I Rahwan - arxiv preprint arxiv:2407.12143, 2024 - arxiv.org

The deployment of AI systems for welfare benefit allocation allows for accelerated decision-
making and faster provision of critical help, but has already led to an increase in unfair …

Enregistrer Citer Autres articles Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

Personalizing reinforcement learning from human feedback with variational preference learning

Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?

ProgressGym: Alignment with a Millennium of Moral Progress

Online Learning from Strategic Human Feedback in LLM Fine-Tuning

Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

Representative Social Choice: From Learning Theory to AI Alignment

Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas

Direct Preference Optimization With Unobserved Preference Heterogeneity

Can LLM be a Personalized Judge?

False consensus biases AI against vulnerable stakeholders