- Academic Search

MG Bellemare, W Dabney, M Rowland - 2023 - books.google.com

The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …

Save Cite Cited by 173 Related articles All 9 versions Free GPT-4 Library Search

[Free GPT-4]

[PDF] neurips.cc

An alternative to variance: Gini deviation for risk-averse policy gradient

Y Luo, G Liu, P Poupart, Y Pan - Advances in Neural …, 2023 - proceedings.neurips.cc

Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement
Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional …

Save Cite Cited by 9 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

A unified framework for alternating offline model training and policy learning

S Yang, S Zhang, Y Feng… - Advances in Neural …, 2022 - proceedings.neurips.cc

In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model
from historically collected data, and subsequently utilize the learned model and fixed …

Save Cite Cited by 13 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A behavior regularized implicit policy for offline reinforcement learning

S Yang, Z Wang, H Zheng, Y Feng, M Zhou - ar** and
challenging area in embodied AI. It is crucial for advancing next-generation intelligent robots …

Save Cite Cited by 3 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Risk-conditioned distributional soft actor-critic for risk-sensitive navigation

J Choi, C Dance, JE Kim, S Hwang… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org

Modern navigation algorithms based on deep reinforcement learning (RL) show promising
efficiency and robustness. However, most deep RL algorithms operate in a risk-neutral …

Save Cite Cited by 27 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Non-decreasing quantile function network with efficient exploration for distributional reinforcement learning

F Zhou, Z Zhu, Q Kuang, L Zhang - arxiv preprint arxiv:2105.06696, 2021 - arxiv.org

Although distributional reinforcement learning (DRL) has been widely examined in the past
few years, there are two open questions people are still trying to address. One is how to …

Save Cite Cited by 23 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

Y Liu, X Cao, T Chen, Y Jiang, J You, M Wu… - arxiv preprint arxiv …, 2025 - arxiv.org

Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …

Save Cite Cited by 1 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization

Y Luo, Y Pan, H Wang, P Torr, P Poupart - arxiv preprint arxiv:2403.11062, 2024 - arxiv.org

Reinforcement learning algorithms utilizing policy gradients (PG) to optimize Conditional
Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their …

Save Cite Cited by 2 Related articles All 3 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Implicit distributional reinforcement learning

[BOOK][B] Distributional reinforcement learning

An alternative to variance: Gini deviation for risk-averse policy gradient

A unified framework for alternating offline model training and policy learning

A behavior regularized implicit policy for offline reinforcement learning

Risk-conditioned distributional soft actor-critic for risk-sensitive navigation

Non-decreasing quantile function network with efficient exploration for distributional reinforcement learning

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization