- Academic Search

R Shi, Y Chen, Y Hu, A Liu, H Hajishirzi… - arxiv preprint arxiv …, 2024 - arxiv.org

Aligning language models (LMs) to human preferences has emerged as a critical pursuit,
enabling these models to better serve diverse user needs. Existing methods primarily focus …

Gem Citer Citeret af 10 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Alignment of diffusion models: Fundamentals, challenges, and future

B Liu, S Shao, B Li, L Bai, Z Xu, H **ong, J Kwok… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion models have emerged as the leading paradigm in generative modeling, excelling
in various applications. Despite their success, these models often misalign with human …

Gem Citer Citeret af 7 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Direct alignment of language models via quality-aware self-refinement

R Yu, Y Wang, X Jiao, Y Zhang, JT Kwok - arxiv preprint arxiv:2405.21040, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has been commonly used to align
the behaviors of Large Language Models (LLMs) with human preferences. Recently, a …

Gem Citer Citeret af 2 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking

C Laidlaw, S Singhal, A Dragan - arxiv preprint arxiv:2403.03185, 2024 - arxiv.org

Because it is difficult to precisely specify complex objectives, reinforcement learning policies
are often optimized using proxy reward functions that only approximate the true goal …

Gem Citer Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

L Fluri, L Lang, A Abate, P Forré, D Krueger… - arxiv preprint arxiv …, 2024 - arxiv.org

In reinforcement learning, specifying reward functions that capture the intended task can be
very challenging. Reward learning aims to address this issue by learning the reward …

Gem Citer Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Comparative Analysis of BERT Variants for Text Detection Tasks

X Zhang, L Zhao, J Wang, W Chen, YLH Sun - researchgate.net

Large language models, particularly those based on BERT, have shown notable
performance in various natural language processing tasks. This study focuses on comparing …

Gem Citer Relaterede artikler Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Reward model learning vs. direct policy optimization: A comparative analysis of learning...

Decoding-time language model alignment with multiple objectives

Alignment of diffusion models: Fundamentals, challenges, and future

Direct alignment of language models via quality-aware self-refinement

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

[PDF][PDF] Comparative Analysis of BERT Variants for Text Detection Tasks