A distributional code for value in dopamine-based reinforcement learning

W Dabney, Z Kurth-Nelson, N Uchida, CK Starkweather… - Nature, 2020 - nature.com
Since its introduction, the reward prediction error theory of dopamine has explained a wealth
of empirical phenomena, providing a unifying framework for understanding the …

Distributional reinforcement learning in the brain

AS Lowet, Q Zheng, S Matias, J Drugowitsch… - Trends in …, 2020 - cell.com
Learning about rewards and punishments is critical for survival. Classical studies have
demonstrated an impressive correspondence between the firing of dopamine neurons in the …

Conservative offline distributional reinforcement learning

Y Ma, D Jayaraman, O Bastani - Advances in neural …, 2021 - proceedings.neurips.cc
Many reinforcement learning (RL) problems in practice are offline, learning purely from
observational data. A key challenge is how to ensure the learned policy is safe, which …

An analysis of quantile temporal-difference learning

M Rowland, R Munos, MG Azar, Y Tang… - Journal of Machine …, 2024 - jmlr.org
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement
learning algorithm that has proven to be a key component in several successful large-scale …

Safety-constrained reinforcement learning with a distributional safety critic

Q Yang, TD Simão, SH Tindemans, MTJ Spaan - Machine Learning, 2023 - Springer
Safety is critical to broadening the real-world use of reinforcement learning. Modeling the
safety aspects using a safety-cost signal separate from the reward and bounding the …

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

A feature-specific prediction error model explains dopaminergic heterogeneity

RS Lee, Y Sagiv, B Engelhard, IB Witten… - Nature neuroscience, 2024 - nature.com
The hypothesis that midbrain dopamine (DA) neurons broadcast a reward prediction error
(RPE) is among the great successes of computational neuroscience. However, recent …

An introduction to reinforcement learning for neuroscience

KT Jensen - arxiv preprint arxiv:2311.07315, 2023 - arxiv.org
Reinforcement learning has a rich history in neuroscience, from early work on dopamine as
a reward prediction error signal for temporal difference learning (Schultz et al., 1997) to …

Offline reinforcement learning with value-based episodic memory

X Ma, Y Yang, H Hu, Q Liu, J Yang, C Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by
effectively utilizing previously collected data. Most existing offline RL algorithms use …

Beyond average return in markov decision processes

A Marthe, A Garivier, C Vernade - Advances in Neural …, 2023 - proceedings.neurips.cc
What are the functionals of the reward that can be computed and optimized exactly in
Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic …