Μελετητής Google

W Dabney, Z Kurth-Nelson, N Uchida, CK Starkweather… - Nature, 2020 - nature.com

Since its introduction, the reward prediction error theory of dopamine has explained a wealth
of empirical phenomena, providing a unifying framework for understanding the …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 469 Σχετικά άρθρα Όλες οι 12 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] cell.com

Distributional reinforcement learning in the brain

AS Lowet, Q Zheng, S Matias, J Drugowitsch… - Trends in …, 2020 - cell.com

Learning about rewards and punishments is critical for survival. Classical studies have
demonstrated an impressive correspondence between the firing of dopamine neurons in the …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 67 Σχετικά άρθρα Όλες οι 13 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Conservative offline distributional reinforcement learning

Y Ma, D Jayaraman, O Bastani - Advances in neural …, 2021 - proceedings.neurips.cc

Many reinforcement learning (RL) problems in practice are offline, learning purely from
observational data. A key challenge is how to ensure the learned policy is safe, which …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 98 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

An analysis of quantile temporal-difference learning

M Rowland, R Munos, MG Azar, Y Tang… - Journal of Machine …, 2024 - jmlr.org

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement
learning algorithm that has proven to be a key component in several successful large-scale …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 33 Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Safety-constrained reinforcement learning with a distributional safety critic

Q Yang, TD Simão, SH Tindemans, MTJ Spaan - Machine Learning, 2023 - Springer

Safety is critical to broadening the real-world use of reinforcement learning. Modeling the
safety aspects using a safety-cost signal separate from the reward and bounding the …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 49 Σχετικά άρθρα Όλες οι 13 εκδοχές Αναζήτηση βιβλιοθήκης

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 58 Σχετικά άρθρα Όλες οι 11 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] biorxiv.org

A feature-specific prediction error model explains dopaminergic heterogeneity

RS Lee, Y Sagiv, B Engelhard, IB Witten… - Nature neuroscience, 2024 - nature.com

The hypothesis that midbrain dopamine (DA) neurons broadcast a reward prediction error
(RPE) is among the great successes of computational neuroscience. However, recent …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 28 Σχετικά άρθρα Όλες οι 9 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An introduction to reinforcement learning for neuroscience

KT Jensen - arxiv preprint arxiv:2311.07315, 2023 - arxiv.org

Reinforcement learning has a rich history in neuroscience, from early work on dopamine as
a reward prediction error signal for temporal difference learning (Schultz et al., 1997) to …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 5 Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Offline reinforcement learning with value-based episodic memory

X Ma, Y Yang, H Hu, Q Liu, J Yang, C Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org

Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by
effectively utilizing previously collected data. Most existing offline RL algorithms use …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 47 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Beyond average return in markov decision processes

A Marthe, A Garivier, C Vernade - Advances in Neural …, 2023 - proceedings.neurips.cc

What are the functionals of the reward that can be computed and optimized exactly in
Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 11 Σχετικά άρθρα Όλες οι 9 εκδοχές Προβολή ως HTML

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Statistics and samples in distributional reinforcement learning

A distributional code for value in dopamine-based reinforcement learning

Distributional reinforcement learning in the brain

Conservative offline distributional reinforcement learning

An analysis of quantile temporal-difference learning

Safety-constrained reinforcement learning with a distributional safety critic

Universal off-policy evaluation

A feature-specific prediction error model explains dopaminergic heterogeneity

An introduction to reinforcement learning for neuroscience

Offline reinforcement learning with value-based episodic memory

Beyond average return in markov decision processes