- Academic Search

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

保存引用被引用数: 232 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Human-in-the-loop reinforcement learning: A survey and position on requirements, challenges, and opportunities

CO Retzlaff, S Das, C Wayllace, P Mousavi… - Journal of Artificial …, 2024 - jair.org

Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to
enable agents to learn and perform tasks autonomously with superhuman performance …

保存引用被引用数: 47 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Teaching language models to support answers with verified quotes

J Menick, M Trebacz, V Mikulik, J Aslanides… - arxiv preprint arxiv …, 2022 - arxiv.org

Recent large language models often answer factual questions correctly. But users can't trust
any given claim a model makes without fact-checking, because language models can …

保存引用被引用数: 217 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] ai-plans.com

[PDF][PDF] Nash learning from human feedback

R Munos, M Valko, D Calandriello, MG Azar… - arxiv preprint arxiv …, 2023 - ai-plans.com

Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …

保存引用被引用数: 98 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training

K Lee, L Smith, P Abbeel - arxiv preprint arxiv:2106.05091, 2021 - arxiv.org

Conveying complex objectives to reinforcement learning (RL) agents can often be difficult,
involving meticulous design of reward functions that are sufficiently informative yet easy …

保存引用被引用数: 304 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Deep reinforcement learning from human preferences

PF Christiano, J Leike, T Brown… - Advances in neural …, 2017 - proceedings.neurips.cc

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world
environments, we need to communicate complex goals to these systems. In this work, we …

保存引用被引用数: 3482 関連記事全 14 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Directly fine-tuning diffusion models on differentiable rewards

K Clark, P Vicol, K Swersky, DJ Fleet - arxiv preprint arxiv:2309.17400, 2023 - arxiv.org

We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-
tuning diffusion models to maximize differentiable reward functions, such as scores from …

保存引用被引用数: 108 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

保存引用被引用数: 47 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Contrastive prefence learning: Learning from human feedback without rl

J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …

保存引用被引用数: 63 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

A survey of preference-based reinforcement learning methods

C Wirth, R Akrour, G Neumann, J Fürnkranz - Journal of Machine Learning …, 2017 - jmlr.org

Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a
suitably chosen reward function. However, designing such a reward function often requires …

保存引用被引用数: 442 関連記事全 10 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Preference-based policy learning

Ai alignment: A comprehensive survey

Human-in-the-loop reinforcement learning: A survey and position on requirements, challenges, and opportunities

Teaching language models to support answers with verified quotes

[PDF][PDF] Nash learning from human feedback

Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training

Deep reinforcement learning from human preferences

Directly fine-tuning diffusion models on differentiable rewards

Inverse preference learning: Preference-based rl without a reward function

Contrastive prefence learning: Learning from human feedback without rl

A survey of preference-based reinforcement learning methods