A survey on offline reinforcement learning: Taxonomy, review, and open problems
RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …
experienced a dramatic increase in popularity, scaling to previously intractable problems …
Pessimistic value iteration for multi-task data sharing in Offline Reinforcement Learning
Abstract Offline Reinforcement Learning (RL) has shown promising results in learning a task-
specific policy from a fixed dataset. However, successful offline RL often relies heavily on the …
specific policy from a fixed dataset. However, successful offline RL often relies heavily on the …
Robust quadrupedal locomotion via risk-averse policy learning
The robustness of legged locomotion is crucial for quadrupedal robots in challenging
terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged …
terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged …
Ovd-explorer: Optimism should not be the sole pursuit of exploration in noisy environments
In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream
principle for directing exploration towards less explored areas, characterized by higher …
principle for directing exploration towards less explored areas, characterized by higher …
Mild policy evaluation for offline actor–critic
In offline actor–critic (AC) algorithms, the distributional shift between the training data and
target policy causes optimistic value estimates for out-of-distribution (OOD) actions. This …
target policy causes optimistic value estimates for out-of-distribution (OOD) actions. This …
ACL-QL: Adaptive Conservative Level in -Learning for Offline Reinforcement Learning
Offline reinforcement learning (RL), which operates solely on static datasets without further
interactions with the environment, provides an appealing alternative to learning a safe and …
interactions with the environment, provides an appealing alternative to learning a safe and …
SinKD: Sinkhorn Distance Minimization for Knowledge Distillation
Knowledge distillation (KD) has been widely adopted to compress large language models
(LLMs). Existing KD methods investigate various divergence measures including the …
(LLMs). Existing KD methods investigate various divergence measures including the …
Motion planner with fixed-horizon constrained reinforcement learning for complex autonomous driving scenarios
In autonomous driving, behavioral decision-making and trajectory planning remain huge
challenges due to the large amount of uncertainty in environments and complex interaction …
challenges due to the large amount of uncertainty in environments and complex interaction …
Outlier-adaptive-based non-crossing quantiles method for day-ahead electricity price forecasting
In deregulated electricity markets, accurate and reliable day-ahead electricity price
forecasting (EPF) is beneficial for hedging volatility risks, implementing dispatch controls …
forecasting (EPF) is beneficial for hedging volatility risks, implementing dispatch controls …
A Cybersecure Distribution-Free Learning Model for Interval Forecasting of Power Load Under Cyberattacks
Reliable interval forecasting of uncertain load with cyberattacks is critical for grid resilience
and decision-making. Nevertheless, various constraints on the predictive distribution …
and decision-making. Nevertheless, various constraints on the predictive distribution …