Distributional reinforcement learning in the brain

AS Lowet, Q Zheng, S Matias, J Drugowitsch… - Trends in …, 2020 - cell.com
Learning about rewards and punishments is critical for survival. Classical studies have
demonstrated an impressive correspondence between the firing of dopamine neurons in the …

A distributional code for value in dopamine-based reinforcement learning

W Dabney, Z Kurth-Nelson, N Uchida, CK Starkweather… - Nature, 2020 - nature.com
Since its introduction, the reward prediction error theory of dopamine has explained a wealth
of empirical phenomena, providing a unifying framework for understanding the …

Conservative offline distributional reinforcement learning

Y Ma, D Jayaraman, O Bastani - Advances in neural …, 2021 - proceedings.neurips.cc
Many reinforcement learning (RL) problems in practice are offline, learning purely from
observational data. A key challenge is how to ensure the learned policy is safe, which …

A feature-specific prediction error model explains dopaminergic heterogeneity

RS Lee, Y Sagiv, B Engelhard, IB Witten… - Nature neuroscience, 2024 - nature.com
The hypothesis that midbrain dopamine (DA) neurons broadcast a reward prediction error
(RPE) is among the great successes of computational neuroscience. However, recent …

Safety-constrained reinforcement learning with a distributional safety critic

Q Yang, TD Simão, SH Tindemans, MTJ Spaan - Machine Learning, 2023 - Springer
Safety is critical to broadening the real-world use of reinforcement learning. Modeling the
safety aspects using a safety-cost signal separate from the reward and bounding the …

[책][B] Distributional reinforcement learning

MG Bellemare, W Dabney, M Rowland - 2023 - books.google.com
The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …

[PDF][PDF] An analysis of quantile temporal-difference learning

M Rowland, R Munos, MG Azar, Y Tang… - Journal of Machine …, 2024 - jmlr.org
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement
learning algorithm that has proven to be a key component in several successful large-scale …

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Offline reinforcement learning with value-based episodic memory

X Ma, Y Yang, H Hu, Q Liu, J Yang, C Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by
effectively utilizing previously collected data. Most existing offline RL algorithms use …

DFAC framework: Factorizing the value function via quantile mixture for multi-agent distributional Q-learning

WF Sun, CK Lee, CY Lee - International Conference on …, 2021 - proceedings.mlr.press
In fully cooperative multi-agent reinforcement learning (MARL) settings, the environments
are highly stochastic due to the partial observability of each agent and the continuously …