Towards bidirectional human-ai alignment: A systematic review for clarifications, framework, and future directions

H Shen, T Knearem, R Ghosh, K Alkiek… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Recent advancements in general-purpose AI have highlighted the importance of guiding AI
systems towards the intended goals, ethical principles, and values of individuals and …

From persona to personalization: A survey on role-playing language agents

J Chen, X Wang, R Xu, S Yuan, Y Zhang, W Shi… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Recent advancements in large language models (LLMs) have significantly boosted the rise
of Role-Playing Language Agents (RPLAs), ie, specialized AI systems designed to simulate …

Fine-tuning language models with advantage-induced policy alignment

B Zhu, H Sharma, FV Frujeri, S Dong, C Zhu… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach
to aligning large language models (LLMs) to human preferences. Among the plethora of …

The alignment ceiling: Objective mismatch in reinforcement learning from human feedback

N Lambert, R Calandra - arxiv preprint arxiv:2311.00168, 2023‏ - arxiv.org
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique
to make large language models (LLMs) more capable in complex settings. RLHF proceeds …

[PDF][PDF] Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

D Jurgens - arxiv preprint arxiv:2406.09264, 2024‏ - 3dvar.com
Despite these numerous investigations into human-AI alignment, its definition and scope
remain ambiguous and inconsistent across literature, for example, regarding whom to align …

Provably Efficient Interactive-Grounded Learning with Personalized Reward

M Zhang, Y Zhang, H Luo, P Mineiro - arxiv preprint arxiv:2405.20677, 2024‏ - arxiv.org
Interactive-Grounded Learning (IGL)[**e et al., 2021] is a powerful framework in which a
learner aims at maximizing unobservable rewards through interacting with an environment …

An information theoretic approach to interaction-grounded learning

X Hu, F Farnia, H Leung - arxiv preprint arxiv:2401.05015, 2024‏ - arxiv.org
Reinforcement learning (RL) problems where the learner attempts to infer an unobserved
reward from some feedback variables have been studied in several recent papers. The …

Controllable Personalization for Information Access

S Mysore - 2024‏ - scholarworks.umass.edu
Abstract Information access systems mediate how we find and discover information in nearly
every walk of life. The ranking models powering these systems base their predictions on …

ICLR 2025 Workshop on Bidirectional Human-AI Alignment

H Shen, Z Ma, R Ghosh, T Knearem, MX Liu… - ICLR 2025 Workshop …‏ - openreview.net
As AI systems grow more integrated into real-world applications, the traditional one-way
approach to AI alignment is proving insufficient. Bidirectional Human-AI Alignment proposes …

[PDF][PDF] Human-Interactive Robot Learning: Definition, Challenges, and Recommendations

K Baraka, TK FAULKNER, E BIYIK, B SERENA… - 2018‏ - sannevw.github.io
Robot learning from humans has been proposed and researched for several decades as a
means to enable robots to learn new skills or adapt existing ones to new situations. Recent …