Distributional reinforcement learning in the brain
Learning about rewards and punishments is critical for survival. Classical studies have
demonstrated an impressive correspondence between the firing of dopamine neurons in the …
demonstrated an impressive correspondence between the firing of dopamine neurons in the …
A distributional code for value in dopamine-based reinforcement learning
Since its introduction, the reward prediction error theory of dopamine has explained a wealth
of empirical phenomena, providing a unifying framework for understanding the …
of empirical phenomena, providing a unifying framework for understanding the …
Conservative offline distributional reinforcement learning
Many reinforcement learning (RL) problems in practice are offline, learning purely from
observational data. A key challenge is how to ensure the learned policy is safe, which …
observational data. A key challenge is how to ensure the learned policy is safe, which …
A feature-specific prediction error model explains dopaminergic heterogeneity
The hypothesis that midbrain dopamine (DA) neurons broadcast a reward prediction error
(RPE) is among the great successes of computational neuroscience. However, recent …
(RPE) is among the great successes of computational neuroscience. However, recent …
Safety-constrained reinforcement learning with a distributional safety critic
Safety is critical to broadening the real-world use of reinforcement learning. Modeling the
safety aspects using a safety-cost signal separate from the reward and bounding the …
safety aspects using a safety-cost signal separate from the reward and bounding the …
[책][B] Distributional reinforcement learning
The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …
mathematical formalism for thinking about decisions from a probabilistic perspective …
[PDF][PDF] An analysis of quantile temporal-difference learning
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement
learning algorithm that has proven to be a key component in several successful large-scale …
learning algorithm that has proven to be a key component in several successful large-scale …
Universal off-policy evaluation
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …
what would happen if decisions were made using a new policy. Those predictions must …
Offline reinforcement learning with value-based episodic memory
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by
effectively utilizing previously collected data. Most existing offline RL algorithms use …
effectively utilizing previously collected data. Most existing offline RL algorithms use …
DFAC framework: Factorizing the value function via quantile mixture for multi-agent distributional Q-learning
In fully cooperative multi-agent reinforcement learning (MARL) settings, the environments
are highly stochastic due to the partial observability of each agent and the continuously …
are highly stochastic due to the partial observability of each agent and the continuously …