A modified random network distillation algorithm and its application in USVs naval battle simulation

J Rao, X Xu, H Bian, J Chen, Y Wang, J Lei… - Ocean …, 2022 - Elsevier
Unmanned surface vessel (USV) operations will change the future form of maritime wars
profoundly, and one of the critical factors for victory is the cluster intelligence of USVs …

Balanced prioritized experience replay in off-policy reinforcement learning

Z Lou, Y Wang, S Shan, K Zhang, H Wei - Neural Computing and …, 2024 - Springer
Abstract In Off-Policy reinforcement learning (RL), the experience imbalance problem can
affect learning performance. The experience imbalance problem refers to the phenomenon …

An AUV target-tracking method combining imitation learning and deep reinforcement learning

Y Mao, F Gao, Q Zhang, Z Yang - Journal of Marine Science and …, 2022 - mdpi.com
This study aims to solve the problem of sparse reward and local convergence when using a
reinforcement learning algorithm as the controller of an AUV. Based on the generative …

Hierarchical reinforcement learning with unlimited option scheduling for sparse rewards in continuous spaces

Z Huang, Q Liu, F Zhu, L Zhang, L Wu - Expert Systems with Applications, 2024 - Elsevier
The fundamental concept behind option-based hierarchical reinforcement learning (O-HRL)
is to obtain temporal coarse-grained actions and abstract complex situations. Although O …

An efficient planning method based on deep reinforcement learning with hybrid actions for autonomous driving on highway

M Zhang, K Chen, J Zhu - International Journal of Machine Learning and …, 2023 - Springer
Due to the complexity and uncertainty of the traffic, planning for autonomous driving (AD) on
highway is challenging. Traditional planning algorithms have the problems of low and …

Addressing hindsight bias in multigoal reinforcement learning

C Bai, L Wang, Y Wang, Z Wang… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
Multigoal reinforcement learning (RL) extends the typical RL with goal-conditional value
functions and policies. One efficient multigoal RL algorithm is the hindsight experience …

Self-learning-based multiple spacecraft evasion decision making simulation under sparse reward condition

Z Yu, J Guo, Y Peng, C Bai - Journal of …, 2021 - dc-china-simulation …
In order to improve the ability of spacecraft formation to evade multiple interceptors, aiming
at the low success rate of traditional procedural maneuver evasion, a multi-agent …

Motion planning of space robot obstacle avoidance based on DDPG algorithm

H Sang, S Wang - 2022 International Conference on Service …, 2022 - ieeexplore.ieee.org
In order to solve the problem of unstructured environment and complex operation task of
space robot, this paper use DDPG algorithm which is data-driven and model free in the …

[HTML][HTML] **化学**稀疏奖励算法研究——理论与实验

杨瑞, 严江鹏, **秀 - 智能系统学报, 2020 - html.rhhz.net
**年来, **化学**在游戏, 机器人控制等序列决策领域都获得了巨大的成功, 但是大量实际问题中
奖励信号十分稀疏, 导致智能体难以从与环境的交互中学**到最优的策略, 这一问题被称为稀疏 …

[HTML][HTML] Simulation Training System for Parafoil Motion Controller Based on Actor–Critic RL Approach

X He, J Liu, J Zhao, R Xu, Q Liu, J Wan, G Yu - Actuators, 2024 - mdpi.com
The unique ram air aerodynamic shape and control rope pulling course of the parafoil
system make it difficult to realize its precise control. At present, the commonly used control …