[BOOK][B] Distributional reinforcement learning

MG Bellemare, W Dabney, M Rowland - 2023 - books.google.com
The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …

An alternative to variance: Gini deviation for risk-averse policy gradient

Y Luo, G Liu, P Poupart, Y Pan - Advances in Neural …, 2023 - proceedings.neurips.cc
Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement
Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional …

A unified framework for alternating offline model training and policy learning

S Yang, S Zhang, Y Feng… - Advances in Neural …, 2022 - proceedings.neurips.cc
In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model
from historically collected data, and subsequently utilize the learned model and fixed …

A behavior regularized implicit policy for offline reinforcement learning

S Yang, Z Wang, H Zheng, Y Feng, M Zhou - ar** and
challenging area in embodied AI. It is crucial for advancing next-generation intelligent robots …

Risk-conditioned distributional soft actor-critic for risk-sensitive navigation

J Choi, C Dance, JE Kim, S Hwang… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Modern navigation algorithms based on deep reinforcement learning (RL) show promising
efficiency and robustness. However, most deep RL algorithms operate in a risk-neutral …

Non-decreasing quantile function network with efficient exploration for distributional reinforcement learning

F Zhou, Z Zhu, Q Kuang, L Zhang - arxiv preprint arxiv:2105.06696, 2021 - arxiv.org
Although distributional reinforcement learning (DRL) has been widely examined in the past
few years, there are two open questions people are still trying to address. One is how to …

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

Y Liu, X Cao, T Chen, Y Jiang, J You, M Wu… - arxiv preprint arxiv …, 2025 - arxiv.org
Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …

A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization

Y Luo, Y Pan, H Wang, P Torr, P Poupart - arxiv preprint arxiv:2403.11062, 2024 - arxiv.org
Reinforcement learning algorithms utilizing policy gradients (PG) to optimize Conditional
Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their …