A modified random network distillation algorithm and its application in USVs naval battle simulation
J Rao, X Xu, H Bian, J Chen, Y Wang, J Lei… - Ocean …, 2022 - Elsevier
Unmanned surface vessel (USV) operations will change the future form of maritime wars
profoundly, and one of the critical factors for victory is the cluster intelligence of USVs …
profoundly, and one of the critical factors for victory is the cluster intelligence of USVs …
Balanced prioritized experience replay in off-policy reinforcement learning
Abstract In Off-Policy reinforcement learning (RL), the experience imbalance problem can
affect learning performance. The experience imbalance problem refers to the phenomenon …
affect learning performance. The experience imbalance problem refers to the phenomenon …
An AUV target-tracking method combining imitation learning and deep reinforcement learning
Y Mao, F Gao, Q Zhang, Z Yang - Journal of Marine Science and …, 2022 - mdpi.com
This study aims to solve the problem of sparse reward and local convergence when using a
reinforcement learning algorithm as the controller of an AUV. Based on the generative …
reinforcement learning algorithm as the controller of an AUV. Based on the generative …
Hierarchical reinforcement learning with unlimited option scheduling for sparse rewards in continuous spaces
Z Huang, Q Liu, F Zhu, L Zhang, L Wu - Expert Systems with Applications, 2024 - Elsevier
The fundamental concept behind option-based hierarchical reinforcement learning (O-HRL)
is to obtain temporal coarse-grained actions and abstract complex situations. Although O …
is to obtain temporal coarse-grained actions and abstract complex situations. Although O …
An efficient planning method based on deep reinforcement learning with hybrid actions for autonomous driving on highway
M Zhang, K Chen, J Zhu - International Journal of Machine Learning and …, 2023 - Springer
Due to the complexity and uncertainty of the traffic, planning for autonomous driving (AD) on
highway is challenging. Traditional planning algorithms have the problems of low and …
highway is challenging. Traditional planning algorithms have the problems of low and …
Addressing hindsight bias in multigoal reinforcement learning
Multigoal reinforcement learning (RL) extends the typical RL with goal-conditional value
functions and policies. One efficient multigoal RL algorithm is the hindsight experience …
functions and policies. One efficient multigoal RL algorithm is the hindsight experience …
Self-learning-based multiple spacecraft evasion decision making simulation under sparse reward condition
Z Yu, J Guo, Y Peng, C Bai - Journal of …, 2021 - dc-china-simulation …
In order to improve the ability of spacecraft formation to evade multiple interceptors, aiming
at the low success rate of traditional procedural maneuver evasion, a multi-agent …
at the low success rate of traditional procedural maneuver evasion, a multi-agent …
Motion planning of space robot obstacle avoidance based on DDPG algorithm
H Sang, S Wang - 2022 International Conference on Service …, 2022 - ieeexplore.ieee.org
In order to solve the problem of unstructured environment and complex operation task of
space robot, this paper use DDPG algorithm which is data-driven and model free in the …
space robot, this paper use DDPG algorithm which is data-driven and model free in the …
[HTML][HTML] **化学**稀疏奖励算法研究——理论与实验
杨瑞, 严江鹏, **秀 - 智能系统学报, 2020 - html.rhhz.net
**年来, **化学**在游戏, 机器人控制等序列决策领域都获得了巨大的成功, 但是大量实际问题中
奖励信号十分稀疏, 导致智能体难以从与环境的交互中学**到最优的策略, 这一问题被称为稀疏 …
奖励信号十分稀疏, 导致智能体难以从与环境的交互中学**到最优的策略, 这一问题被称为稀疏 …
[HTML][HTML] Simulation Training System for Parafoil Motion Controller Based on Actor–Critic RL Approach
X He, J Liu, J Zhao, R Xu, Q Liu, J Wan, G Yu - Actuators, 2024 - mdpi.com
The unique ram air aerodynamic shape and control rope pulling course of the parafoil
system make it difficult to realize its precise control. At present, the commonly used control …
system make it difficult to realize its precise control. At present, the commonly used control …