Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A distributional code for value in dopamine-based reinforcement learning
Since its introduction, the reward prediction error theory of dopamine has explained a wealth
of empirical phenomena, providing a unifying framework for understanding the …
of empirical phenomena, providing a unifying framework for understanding the …
Distributional reinforcement learning in the brain
Learning about rewards and punishments is critical for survival. Classical studies have
demonstrated an impressive correspondence between the firing of dopamine neurons in the …
demonstrated an impressive correspondence between the firing of dopamine neurons in the …
Conservative offline distributional reinforcement learning
Many reinforcement learning (RL) problems in practice are offline, learning purely from
observational data. A key challenge is how to ensure the learned policy is safe, which …
observational data. A key challenge is how to ensure the learned policy is safe, which …
An analysis of quantile temporal-difference learning
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement
learning algorithm that has proven to be a key component in several successful large-scale …
learning algorithm that has proven to be a key component in several successful large-scale …
Safety-constrained reinforcement learning with a distributional safety critic
Safety is critical to broadening the real-world use of reinforcement learning. Modeling the
safety aspects using a safety-cost signal separate from the reward and bounding the …
safety aspects using a safety-cost signal separate from the reward and bounding the …
Universal off-policy evaluation
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …
what would happen if decisions were made using a new policy. Those predictions must …
A feature-specific prediction error model explains dopaminergic heterogeneity
RS Lee, Y Sagiv, B Engelhard, IB Witten… - Nature neuroscience, 2024 - nature.com
The hypothesis that midbrain dopamine (DA) neurons broadcast a reward prediction error
(RPE) is among the great successes of computational neuroscience. However, recent …
(RPE) is among the great successes of computational neuroscience. However, recent …
An introduction to reinforcement learning for neuroscience
KT Jensen - arxiv preprint arxiv:2311.07315, 2023 - arxiv.org
Reinforcement learning has a rich history in neuroscience, from early work on dopamine as
a reward prediction error signal for temporal difference learning (Schultz et al., 1997) to …
a reward prediction error signal for temporal difference learning (Schultz et al., 1997) to …
Offline reinforcement learning with value-based episodic memory
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by
effectively utilizing previously collected data. Most existing offline RL algorithms use …
effectively utilizing previously collected data. Most existing offline RL algorithms use …
Beyond average return in markov decision processes
A Marthe, A Garivier, C Vernade - Advances in Neural …, 2023 - proceedings.neurips.cc
What are the functionals of the reward that can be computed and optimized exactly in
Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic …
Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic …