- Academic Search

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

HR Kirk, B Vidgen, P Röttger, SA Hale - Nature Machine Intelligence, 2024 - nature.com

Large language models (LLMs) undergo 'alignment'so that they better reflect human values
or preferences, and are safer or more useful. However, alignment is intrinsically difficult …

保存引用被引用数: 77 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - ar** large language
models (LLMs), LLMs often learn an averaged human preference and struggle to model …

保存引用被引用数: 16 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Personalizing reinforcement learning from human feedback with variational preference learning

S Poddar, Y Wan, H Ivison, A Gupta… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning
foundation models to human values and preferences. However, current RLHF techniques …

保存引用被引用数: 10 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Are Large Language Models Consistent over Value-laden Questions?

J Moore, T Deshpande, D Yang - arxiv preprint arxiv:2407.02996, 2024 - arxiv.org

Large language models (LLMs) appear to bias their survey answers toward certain values.
Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are …

保存引用被引用数: 9 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

H Wang, W Fu, Y Tang, Z Chen, Y Huang… - arxiv preprint arxiv …, 2025 - arxiv.org

While large language models (LLMs) present significant potential for supporting numerous
real-world applications and delivering positive social impacts, they still face significant …

保存引用被引用数: 1 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Culture-gen: Revealing global cultural perception in language models through natural language prompting

H Li, L Jiang, JD Hwang, H Kim, S Santy… - arxiv preprint arxiv …, 2024 - arxiv.org

As the utilization of large language models (LLMs) has proliferated world-wide, it is crucial
for them to have adequate knowledge and fair representation for diverse global cultures. In …

保存引用被引用数: 9 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] springer.com

Beyond preferences in ai alignment

T Zhi-Xuan, M Carroll, M Franklin, H Ashton - Philosophical Studies, 2024 - Springer

The dominant practice of AI alignment assumes (1) that preferences are an adequate
representation of human values,(2) that human rationality can be understood in terms of …

保存引用被引用数: 8 関連記事全 6 バージョン

引用

検索オプション

マイライブラリに保存しました

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Foundational challenges in assuring alignment and safety of large language models

Personalizing reinforcement learning from human feedback with variational preference learning

Are Large Language Models Consistent over Value-laden Questions?

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

Culture-gen: Revealing global cultural perception in language models through natural language prompting

Beyond preferences in ai alignment