Secrets of rlhf in large language models part i: Ppo

R Zheng, S Dou, S Gao, Y Hua, W Shen… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large language models (LLMs) have formulated a blueprint for the advancement of artificial
general intelligence. Its primary objective is to function as a human-centric (helpful, honest …

Delve into PPO: Implementation matters for stable RLHF

R Zheng, S Dou, S Gao, Y Hua, W Shen… - … 2023 Workshop on …, 2023‏ - openreview.net
Large language models (LLMs) have formulated a blueprint for the advancement of artificial
general intelligence. Its primary objective is to function as a human-centric (helpful, honest …

Design of energy-saving driving strategy based on proximal policy optimization considering urban transport information

Q Liu, D Sun, H Chen, D Li, P Wang - Control Theory and Technology, 2024‏ - Springer
Eco-driving has always been an ongoing topic. In urban driving conditions, traffic
regulations, other vehicle behaviors, and special driving scenarios will have a major impact …

A New Decision-Making Approach via Monte Carlo Tree Search and A2C

T Ou, J Cao, Y Lu, Y Wang, X Wu - 2023 3rd International …, 2023‏ - ieeexplore.ieee.org
Monte Carlo Tree Search (MCTS) is a state-of-the-art algorithm suitable for decision-making
problem in adversarial complex environments. In this paper, aimed at the challenge of …

[HTML][HTML] A Needs Learning Algorithm Applied to Stable Gait Generation of Quadruped Robot

H Zhang, J Yin, H Wang - Sensors, 2022‏ - mdpi.com
Based on Maslow's hierarchy of needs theory, we have proposed a novel machine learning
algorithm that combines factors of the environment and its own needs to make decisions for …