The benefits, risks and bounds of personalizing the alignment of large language models to individuals
Large language models (LLMs) undergo 'alignment'so that they better reflect human values
or preferences, and are safer or more useful. However, alignment is intrinsically difficult …
or preferences, and are safer or more useful. However, alignment is intrinsically difficult …
Personalizing reinforcement learning from human feedback with variational preference learning
Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning
foundation models to human values and preferences. However, current RLHF techniques …
foundation models to human values and preferences. However, current RLHF techniques …
Are Large Language Models Consistent over Value-laden Questions?
Large language models (LLMs) appear to bias their survey answers toward certain values.
Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are …
Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are …
A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
While large language models (LLMs) present significant potential for supporting numerous
real-world applications and delivering positive social impacts, they still face significant …
real-world applications and delivering positive social impacts, they still face significant …
Culture-gen: Revealing global cultural perception in language models through natural language prompting
As the utilization of large language models (LLMs) has proliferated world-wide, it is crucial
for them to have adequate knowledge and fair representation for diverse global cultures. In …
for them to have adequate knowledge and fair representation for diverse global cultures. In …
Beyond preferences in ai alignment
The dominant practice of AI alignment assumes (1) that preferences are an adequate
representation of human values,(2) that human rationality can be understood in terms of …
representation of human values,(2) that human rationality can be understood in terms of …