Decoding-time language model alignment with multiple objectives

R Shi, Y Chen, Y Hu, A Liu, H Hajishirzi… - arxiv preprint arxiv …, 2024 - arxiv.org
Aligning language models (LMs) to human preferences has emerged as a critical pursuit,
enabling these models to better serve diverse user needs. Existing methods primarily focus …

Alignment of diffusion models: Fundamentals, challenges, and future

B Liu, S Shao, B Li, L Bai, Z Xu, H **ong, J Kwok… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have emerged as the leading paradigm in generative modeling, excelling
in various applications. Despite their success, these models often misalign with human …

Direct alignment of language models via quality-aware self-refinement

R Yu, Y Wang, X Jiao, Y Zhang, JT Kwok - arxiv preprint arxiv:2405.21040, 2024 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) has been commonly used to align
the behaviors of Large Language Models (LLMs) with human preferences. Recently, a …

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking

C Laidlaw, S Singhal, A Dragan - arxiv preprint arxiv:2403.03185, 2024 - arxiv.org
Because it is difficult to precisely specify complex objectives, reinforcement learning policies
are often optimized using proxy reward functions that only approximate the true goal …

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

L Fluri, L Lang, A Abate, P Forré, D Krueger… - arxiv preprint arxiv …, 2024 - arxiv.org
In reinforcement learning, specifying reward functions that capture the intended task can be
very challenging. Reward learning aims to address this issue by learning the reward …

[PDF][PDF] Comparative Analysis of BERT Variants for Text Detection Tasks

X Zhang, L Zhao, J Wang, W Chen, YLH Sun - researchgate.net
Large language models, particularly those based on BERT, have shown notable
performance in various natural language processing tasks. This study focuses on comparing …