- Academic Search

Delve into PPO: Implementation matters for stable RLHF

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

The inadequacy of reinforcement learning from human feedback-radicalizing large language models via semantic vulnerabilities

TR McIntosh, T Susnjak, T Liu, P Watters… - … on Cognitive and …, 2024 - ieeexplore.ieee.org

This study is an empirical investigation into the semantic vulnerabilities of four popular
pretrained commercial large language models (LLMs) to ideological manipulation. Using …

Speichern Zitieren Zitiert von: 89 Ähnliche Artikel

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Stepcoder: Improve code generation with reinforcement learning from compiler feedback

S Dou, Y Liu, H Jia, L **ong, E Zhou, W Shen… - arxiv preprint arxiv …, 2024 - arxiv.org

The advancement of large language models (LLMs) has significantly propelled the field of
code generation. Previous work integrated reinforcement learning (RL) with compiler …

Speichern Zitieren Zitiert von: 12 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Direct preference optimization using sparse feature-level constraints

Q Yin, CT Leong, H Zhang, M Zhu, H Yan… - arxiv preprint arxiv …, 2024 - arxiv.org

The alignment of large language models (LLMs) with human preferences remains a key
challenge. While post-training techniques like Reinforcement Learning from Human …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking

Y Miao, S Zhang, L Ding, Y Zhang, L Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org

This work identifies the Energy Loss Phenomenon in Reinforcement Learning from Human
Feedback (RLHF) and its connection to reward hacking. Specifically, energy loss in the final …

Speichern Zitieren Ähnliche Artikel HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving DeFi Accessibility through Efficient Liquidity Provisioning with Deep Reinforcement Learning

H Xu, A Brini - arxiv preprint arxiv:2501.07508, 2025 - arxiv.org

This paper applies deep reinforcement learning (DRL) to optimize liquidity provisioning in
Uniswap v3, a decentralized finance (DeFi) protocol implementing an automated market …

Speichern Zitieren Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] mathnet.ru

Ensuring trustworthy code: leveraging a static analyzer to identify and mitigate defects in generated code

DS Shaikhelislamov, MD Drobyshevskiy… - Записки научных …, 2024 - mathnet.ru

The rise of large language models (LLMs) has greatly advanced code generation
capabilities. A recent StackOverflow survey found that 70% of developers are using or …

Speichern Zitieren Ähnliche Artikel Alle 4 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Delve into PPO: Implementation matters for stable RLHF

The inadequacy of reinforcement learning from human feedback-radicalizing large language models via semantic vulnerabilities

Stepcoder: Improve code generation with reinforcement learning from compiler feedback

Direct preference optimization using sparse feature-level constraints

The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking

Improving DeFi Accessibility through Efficient Liquidity Provisioning with Deep Reinforcement Learning

Ensuring trustworthy code: leveraging a static analyzer to identify and mitigate defects in generated code