Distributional reinforcement learning with monotonic splines

Y Luo, G Liu, H Duan, O Schulte… - … Conference on Learning …, 2021 - openreview.net
Distributional Reinforcement Learning (RL) differs from traditional RL by estimating the
distribution over returns to capture the intrinsic uncertainty of MDPs. One key challenge in …

Neural sinkhorn gradient flow

H Zhu, F Wang, C Zhang, H Zhao, H Qian - arxiv preprint arxiv …, 2024 - arxiv.org
Wasserstein Gradient Flows (WGF) with respect to specific functionals have been widely
used in the machine learning literature. Recently, neural networks have been adopted to …

Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning

CF Hayes, M Reymond, DM Roijers, E Howley… - Autonomous Agents and …, 2023 - Springer
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user
is derived from a single execution of a policy. In these settings, making decisions based on …

Enhancing value function estimation through first-order state-action dynamics in offline reinforcement learning

YH Lien, PC Hsieh, TM Li, YS Wang - Forty-first International …, 2024 - openreview.net
In offline reinforcement learning (RL), updating the value function with the discrete-time
Bellman Equation often encounters challenges due to the limited scope of available data …

Expected scalarised returns dominance: a new solution concept for multi-objective decision making

CF Hayes, T Verstraeten, DM Roijers, E Howley… - Neural Computing and …, 2022 - Springer
In many real-world scenarios, the utility of a user is derived from a single execution of a
policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the …

Dopamine neurons encode a multidimensional probabilistic map of future reward

M Sousa, P Bujalski, BF Cruz, K Louie, D McNamee… - bioRxiv, 2023 - biorxiv.org
Learning to predict rewards is a fundamental driver of adaptive behavior. Midbrain
dopamine neurons (DANs) play a key role in such learning by signaling reward prediction …

Cooperative deep reinforcement learning policies for autonomous navigation in complex environments

GW Kim - IEEE Access, 2024 - ieeexplore.ieee.org
A critical part of achieving robust and safe navigation for mobile robots is selecting the right
navigation policies trained through simulation to operate effectively in real-world situations …

Distributional multi-objective decision making

W Röpke, CF Hayes, P Mannion, E Howley… - arxiv preprint arxiv …, 2023 - arxiv.org
For effective decision support in scenarios with conflicting objectives, sets of potentially
optimal solutions can be presented to the decision maker. We explore both what policies …

Bayesian distributional policy gradients

L Li, AA Faisal - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
Abstract Distributional Reinforcement Learning (RL) maintains the entire probability
distribution of the reward-to-go, ie the return, providing more learning signals that account …

Utility-based reinforcement learning: Unifying single-objective and multi-objective reinforcement learning

P Vamplew, C Foale, CF Hayes, P Mannion… - arxiv preprint arxiv …, 2024 - arxiv.org
Research in multi-objective reinforcement learning (MORL) has introduced the utility-based
paradigm, which makes use of both environmental rewards and a function that defines the …