Google Академик

Y Luo, G Liu, H Duan, O Schulte… - … Conference on Learning …, 2021 - openreview.net

Distributional Reinforcement Learning (RL) differs from traditional RL by estimating the
distribution over returns to capture the intrinsic uncertainty of MDPs. One key challenge in …

Сачувај Цитирај 21 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural sinkhorn gradient flow

H Zhu, F Wang, C Zhang, H Zhao, H Qian - arxiv preprint arxiv …, 2024 - arxiv.org

Wasserstein Gradient Flows (WGF) with respect to specific functionals have been widely
used in the machine learning literature. Recently, neural networks have been adopted to …

Сачувај Цитирај 6 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning

CF Hayes, M Reymond, DM Roijers, E Howley… - Autonomous Agents and …, 2023 - Springer

In many risk-aware and multi-objective reinforcement learning settings, the utility of the user
is derived from a single execution of a policy. In these settings, making decisions based on …

Сачувај Цитирај 9 пута наведен Сродни чланци Све верзије (7)

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Enhancing value function estimation through first-order state-action dynamics in offline reinforcement learning

YH Lien, PC Hsieh, TM Li, YS Wang - Forty-first International …, 2024 - openreview.net

In offline reinforcement learning (RL), updating the value function with the discrete-time
Bellman Equation often encounters challenges due to the limited scope of available data …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Expected scalarised returns dominance: a new solution concept for multi-objective decision making

CF Hayes, T Verstraeten, DM Roijers, E Howley… - Neural Computing and …, 2022 - Springer

In many real-world scenarios, the utility of a user is derived from a single execution of a
policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the …

Сачувај Цитирај 18 пута наведен Сродни чланци Све верзије (5)

[Free GPT-4]
[DeepSeek]

[PDF] biorxiv.org

Dopamine neurons encode a multidimensional probabilistic map of future reward

M Sousa, P Bujalski, BF Cruz, K Louie, D McNamee… - bioRxiv, 2023 - biorxiv.org

Learning to predict rewards is a fundamental driver of adaptive behavior. Midbrain
dopamine neurons (DANs) play a key role in such learning by signaling reward prediction …

Сачувај Цитирај 8 пута наведен Сродни чланци Све верзије (3) Кеширано

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Cooperative deep reinforcement learning policies for autonomous navigation in complex environments

GW Kim - IEEE Access, 2024 - ieeexplore.ieee.org

A critical part of achieving robust and safe navigation for mobile robots is selecting the right
navigation policies trained through simulation to operate effectively in real-world situations …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Distributional multi-objective decision making

W Röpke, CF Hayes, P Mannion, E Howley… - arxiv preprint arxiv …, 2023 - arxiv.org

For effective decision support in scenarios with conflicting objectives, sets of potentially
optimal solutions can be presented to the decision maker. We explore both what policies …

Сачувај Цитирај 6 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Bayesian distributional policy gradients

L Li, AA Faisal - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org

Abstract Distributional Reinforcement Learning (RL) maintains the entire probability
distribution of the reward-to-go, ie the return, providing more learning signals that account …

Сачувај Цитирај 14 пута наведен Сродни чланци Све верзије (8) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Utility-based reinforcement learning: Unifying single-objective and multi-objective reinforcement learning

P Vamplew, C Foale, CF Hayes, P Mannion… - arxiv preprint arxiv …, 2024 - arxiv.org

Research in multi-objective reinforcement learning (MORL) has introduced the utility-based
paradigm, which makes use of both environmental rewards and a function that defines the …

Сачувај Цитирај 2 пута наведен Сродни чланци Све верзије (10) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Stochastically dominant distributional reinforcement learning

Distributional reinforcement learning with monotonic splines

Neural sinkhorn gradient flow

Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning

Enhancing value function estimation through first-order state-action dynamics in offline reinforcement learning

Expected scalarised returns dominance: a new solution concept for multi-objective decision making

Dopamine neurons encode a multidimensional probabilistic map of future reward

Cooperative deep reinforcement learning policies for autonomous navigation in complex environments

Distributional multi-objective decision making

Bayesian distributional policy gradients

Utility-based reinforcement learning: Unifying single-objective and multi-objective reinforcement learning