محقق Google

R Zheng, S Dou, S Gao, Y Hua, W Shen… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Large language models (LLMs) have formulated a blueprint for the advancement of artificial
general intelligence. Its primary objective is to function as a human-centric (helpful, honest …‏

ذخیره ارجاع بیان شده در 109 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Delve into PPO: Implementation matters for stable RLHF‏

R Zheng, S Dou, S Gao, Y Hua, W Shen… - … 2023 Workshop on …, 2023‏ - openreview.net‏

Large language models (LLMs) have formulated a blueprint for the advancement of artificial
general intelligence. Its primary objective is to function as a human-centric (helpful, honest …‏

ذخیره ارجاع بیان شده در 6 یافته مقاله‌های مربوط نسخه HTML

Design of energy-saving driving strategy based on proximal policy optimization considering urban transport information‏

Q Liu, D Sun, H Chen, D Li, P Wang - Control Theory and Technology, 2024‏ - Springer‏

Eco-driving has always been an ongoing topic. In urban driving conditions, traffic
regulations, other vehicle behaviors, and special driving scenarios will have a major impact …‏

ذخیره ارجاع مقاله‌های مربوط

A New Decision-Making Approach via Monte Carlo Tree Search and A2C‏

T Ou, J Cao, Y Lu, Y Wang, X Wu - 2023 3rd International …, 2023‏ - ieeexplore.ieee.org‏

Monte Carlo Tree Search (MCTS) is a state-of-the-art algorithm suitable for decision-making
problem in adversarial complex environments. In this paper, aimed at the challenge of …‏

ذخیره ارجاع بیان شده در 1 یافته مقاله‌های مربوط

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] A Needs Learning Algorithm Applied to Stable Gait Generation of Quadruped Robot‏

H Zhang, J Yin, H Wang - Sensors, 2022‏ - mdpi.com‏

Based on Maslow's hierarchy of needs theory, we have proposed a novel machine learning
algorithm that combines factors of the environment and its own needs to make decisions for …‏

ذخیره ارجاع بیان شده در 1 یافته مقاله‌های مربوط تمام نسخه‌های 8 ذخیره‌شده

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Easy RL: Reinforcement Learning Tutorial

Secrets of rlhf in large language models part i: Ppo‏

Delve into PPO: Implementation matters for stable RLHF‏

Design of energy-saving driving strategy based on proximal policy optimization considering urban transport information‏

A New Decision-Making Approach via Monte Carlo Tree Search and A2C‏

[HTML][HTML] A Needs Learning Algorithm Applied to Stable Gait Generation of Quadruped Robot‏