An off-policy trust region policy optimization method with monotonic improvement guarantee for deep reinforcement learning

W Meng, Q Zheng, Y Shi, G Pan - IEEE Transactions on Neural …, 2021 - ieeexplore.ieee.org
In deep reinforcement learning, off-policy data help reduce on-policy interaction with the
environment, and the trust region policy optimization (TRPO) method is efficient to stabilize …

Fuzzy-based predictive deep reinforcement learning for robust and constrained optimal control of industrial solar thermal plants

FB Tilahun - Applied Soft Computing, 2024 - Elsevier
Integrating distributed solar fields (DSFs) into conventional heat and power plants (CHPs) of
industries is mostly constrained by the availability of a real-time capable control scheme …

Learning-based scheduling of industrial hybrid renewable energy systems

PS Pravin, Z Luo, L Li, X Wang - Computers & Chemical Engineering, 2022 - Elsevier
The propagation of distributed renewable energy resources poses several challenges in the
operation of microgrids due to uncertainty. In traditional energy scheduling approaches, the …

Reducing impact of constant power loads on DC energy systems by artificial intelligence

M Gheisarnejad, A Akhbari, M Rahimi… - … on Circuits and …, 2022 - ieeexplore.ieee.org
Due to the negative impedance potential of constant power loads (CPLs), the stability of
power electronic converters-based electrical distribution networks is prone to instability. This …

Dual-arm robot trajectory planning based on deep reinforcement learning under complex environment

W Tang, C Cheng, H Ai, L Chen - Micromachines, 2022 - mdpi.com
In this article, the trajectory planning of the two manipulators of the dual-arm robot is studied
to approach the patient in a complex environment with deep reinforcement learning …

Ai-based radio resource management and trajectory design for IRS-UAV-assisted PD-NOMA communication

HM Hariz, SSZ Mosaddegh, N Mokari… - … on Network and …, 2024 - ieeexplore.ieee.org
This paper proposes the use of unmanned aerial vehicles (UAVs) with intelligent reflecting
surfaces (IRS) to reflect signals from the industrial Internet of things (IIoT) to the destination …

A modified multi-agent proximal policy optimization algorithm for multi-objective dynamic partial-re-entrant hybrid flow shop scheduling problem

J Wu, Y Liu - Engineering Applications of Artificial Intelligence, 2025 - Elsevier
This paper extends a novel model for modern flexible manufacturing systems: the multi-
objective dynamic partial-re-entrant hybrid flow shop scheduling problem (MDPR-HFSP) …

Efficient Deployment of Partial Parallelized Service Function Chains in CPU+ DPU-Based Heterogeneous NFV Platforms

R Wang, X Yu, Q Wu, C Yi, P Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The introduction of network function virtualization (NFV) leads to service function chain
(SFC) deployment problems, promoting the idea of composing network services as …

Automatic tracking control strategy of autonomous trains considering speed restrictions: Using the improved offline deep reinforcement learning method

W Liu, Q Feng, S **ao, H Li - IEEE Access, 2024 - ieeexplore.ieee.org
Previous research on automatic control of high-speed trains in speed limit sections is
insufficient. This article proposes a new offline reinforcement learning strategy for automatic …

A novel intelligent anti-jamming algorithm based on deep reinforcement learning assisted by meta-learning for wireless communication systems

Q Chen, Y Niu, B Wan, P **ang - Applied Sciences, 2023 - mdpi.com
In the field of intelligent anti-jamming, deep reinforcement learning algorithms are regarded
as key technical means. However, the learning process of deep reinforcement learning …