[BOOK][B] Distributional reinforcement learning
The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …
mathematical formalism for thinking about decisions from a probabilistic perspective …
An alternative to variance: Gini deviation for risk-averse policy gradient
Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement
Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional …
Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional …
A unified framework for alternating offline model training and policy learning
In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model
from historically collected data, and subsequently utilize the learned model and fixed …
from historically collected data, and subsequently utilize the learned model and fixed …
Risk-conditioned distributional soft actor-critic for risk-sensitive navigation
Modern navigation algorithms based on deep reinforcement learning (RL) show promising
efficiency and robustness. However, most deep RL algorithms operate in a risk-neutral …
efficiency and robustness. However, most deep RL algorithms operate in a risk-neutral …
Non-decreasing quantile function network with efficient exploration for distributional reinforcement learning
Although distributional reinforcement learning (DRL) has been widely examined in the past
few years, there are two open questions people are still trying to address. One is how to …
few years, there are two open questions people are still trying to address. One is how to …
A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities
Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …
personalization. Powered by modern AI technologies such as multimodal large language …
A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization
Reinforcement learning algorithms utilizing policy gradients (PG) to optimize Conditional
Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their …
Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their …