Deep reinforcement learning based energy management strategies for electrified vehicles: Recent advances and perspectives

H He, X Meng, Y Wang, A Khajepour, X An… - … and Sustainable Energy …, 2024 - Elsevier
Electrified vehicles provide an effective solution to address the unfavorable impacts of fossil
fuel use in the transportation sector. Energy management strategy (EMS) is the core …

[HTML][HTML] Applying artificial intelligence in cryptocurrency markets: A survey

R Amirzadeh, A Nazari, D Thiruvady - Algorithms, 2022 - mdpi.com
The total capital in cryptocurrency markets is around two trillion dollars in 2022, which is
almost the same as Apple's market capitalisation at the same time. Increasingly …

Behavior proximal policy optimization

Z Zhuang, K Lei, J Liu, D Wang, Y Guo - arxiv preprint arxiv:2302.11312, 2023 - arxiv.org
Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-
critic methods perform poorly due to the overestimation of out-of-distribution state-action …

A mixed perception-based human-robot collaborative maintenance approach driven by augmented reality and online deep reinforcement learning

C Liu, Z Zhang, D Tang, Q Nie, L Zhang… - Robotics and Computer …, 2023 - Elsevier
Owing to the fact that the number and complexity of machines is increasing in Industry 4.0,
the maintenance process is more time-consuming and labor-intensive, which contains …

Relative entropy regularized sample-efficient reinforcement learning with continuous actions

Z Shang, R Li, C Zheng, H Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In this article, a novel reinforcement learning (RL) approach, continuous dynamic policy
programming (CDPP), is proposed to tackle the issues of both learning stability and sample …

Off-policy proximal policy optimization

W Meng, Q Zheng, G Pan, Y Yin - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Abstract Proximal Policy Optimization (PPO) is an important reinforcement learning method,
which has achieved great success in sequential decision-making problems. However, PPO …

Reliable PPO-based concurrent multipath transfer for time-sensitive applications

K Liu, W Quan, N Cheng, W Wu, Z Xu… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Time-sensitive applications, eg, Internet of vehicles applications and tactile Internet
applications, put forward low latency and high throughput requirements for communication …

Wastewater treatment monitoring: Fault detection in sensors using transductive learning and improved reinforcement learning

J Yang, K Tian, H Zhao, Z Feng, S Bourouis… - Expert Systems with …, 2025 - Elsevier
Wastewater treatment plants (WWTPs) increasingly utilize sensors to optimize operations
and ensure treated water quality. These sensors' rich datasets are well-suited for automated …

Trust region-based safe distributional reinforcement learning for multiple constraints

D Kim, K Lee, S Oh - Advances in neural information …, 2023 - proceedings.neurips.cc
In safety-critical robotic tasks, potential failures must be reduced, and multiple constraints
must be met, such as avoiding collisions, limiting energy consumption, and maintaining …

Efficient off-policy safe reinforcement learning using trust region conditional value at risk

D Kim, S Oh - IEEE Robotics and Automation Letters, 2022 - ieeexplore.ieee.org
This letter aims to solve a safe reinforcement learning (RL) problem with risk measure-based
constraints. As risk measures, such as conditional value at risk (CVaR), focus on the tail …