The inadequacy of reinforcement learning from human feedback-radicalizing large language models via semantic vulnerabilities

TR McIntosh, T Susnjak, T Liu, P Watters… - … on Cognitive and …, 2024 - ieeexplore.ieee.org
This study is an empirical investigation into the semantic vulnerabilities of four popular
pretrained commercial large language models (LLMs) to ideological manipulation. Using …

Stepcoder: Improve code generation with reinforcement learning from compiler feedback

S Dou, Y Liu, H Jia, L **ong, E Zhou, W Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
The advancement of large language models (LLMs) has significantly propelled the field of
code generation. Previous work integrated reinforcement learning (RL) with compiler …

Direct preference optimization using sparse feature-level constraints

Q Yin, CT Leong, H Zhang, M Zhu, H Yan… - arxiv preprint arxiv …, 2024 - arxiv.org
The alignment of large language models (LLMs) with human preferences remains a key
challenge. While post-training techniques like Reinforcement Learning from Human …

The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking

Y Miao, S Zhang, L Ding, Y Zhang, L Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org
This work identifies the Energy Loss Phenomenon in Reinforcement Learning from Human
Feedback (RLHF) and its connection to reward hacking. Specifically, energy loss in the final …

Improving DeFi Accessibility through Efficient Liquidity Provisioning with Deep Reinforcement Learning

H Xu, A Brini - arxiv preprint arxiv:2501.07508, 2025 - arxiv.org
This paper applies deep reinforcement learning (DRL) to optimize liquidity provisioning in
Uniswap v3, a decentralized finance (DeFi) protocol implementing an automated market …

Ensuring trustworthy code: leveraging a static analyzer to identify and mitigate defects in generated code

DS Shaikhelislamov, MD Drobyshevskiy… - Записки научных …, 2024 - mathnet.ru
The rise of large language models (LLMs) has greatly advanced code generation
capabilities. A recent StackOverflow survey found that 70% of developers are using or …