A survey on offline reinforcement learning: Taxonomy, review, and open problems

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

Pessimistic value iteration for multi-task data sharing in Offline Reinforcement Learning

C Bai, L Wang, J Hao, Z Yang, B Zhao, Z Wang, X Li - Artificial Intelligence, 2024 - Elsevier
Abstract Offline Reinforcement Learning (RL) has shown promising results in learning a task-
specific policy from a fixed dataset. However, successful offline RL often relies heavily on the …

Robust quadrupedal locomotion via risk-averse policy learning

J Shi, C Bai, H He, L Han, D Wang… - … on Robotics and …, 2024 - ieeexplore.ieee.org
The robustness of legged locomotion is crucial for quadrupedal robots in challenging
terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged …

Ovd-explorer: Optimism should not be the sole pursuit of exploration in noisy environments

J Liu, Z Wang, Y Zheng, J Hao, C Bai, J Ye… - Proceedings of the …, 2024 - ojs.aaai.org
In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream
principle for directing exploration towards less explored areas, characterized by higher …

Mild policy evaluation for offline actor–critic

L Huang, B Dong, J Lu, W Zhang - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In offline actor–critic (AC) algorithms, the distributional shift between the training data and
target policy causes optimistic value estimates for out-of-distribution (OOD) actions. This …

ACL-QL: Adaptive Conservative Level in -Learning for Offline Reinforcement Learning

K Wu, Y Zhao, Z Xu, Z Che, C Yin… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
Offline reinforcement learning (RL), which operates solely on static datasets without further
interactions with the environment, provides an appealing alternative to learning a safe and …

SinKD: Sinkhorn Distance Minimization for Knowledge Distillation

X Cui, Y Qin, Y Gao, E Zhang, Z Xu… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
Knowledge distillation (KD) has been widely adopted to compress large language models
(LLMs). Existing KD methods investigate various divergence measures including the …

Motion planner with fixed-horizon constrained reinforcement learning for complex autonomous driving scenarios

K Lin, Y Li, S Chen, D Li, X Wu - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In autonomous driving, behavioral decision-making and trajectory planning remain huge
challenges due to the large amount of uncertainty in environments and complex interaction …

Outlier-adaptive-based non-crossing quantiles method for day-ahead electricity price forecasting

Z Chen, B Zhang, C Du, C Yang, W Gui - Applied Energy, 2025 - Elsevier
In deregulated electricity markets, accurate and reliable day-ahead electricity price
forecasting (EPF) is beneficial for hedging volatility risks, implementing dispatch controls …

A Cybersecure Distribution-Free Learning Model for Interval Forecasting of Power Load Under Cyberattacks

Z Chen, C Du, B Zhang, C Yang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Reliable interval forecasting of uncertain load with cyberattacks is critical for grid resilience
and decision-making. Nevertheless, various constraints on the predictive distribution …