„Google“ mokslinčius

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Išsaugoti Cituoti Cituoja 310 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] kcl.ac.uk

A review of safe reinforcement learning: Methods, theories and applications

S Gu, L Yang, Y Du, G Chen, F Walter… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Išsaugoti Cituoti Cituoja 22 Susiję straipsniai Visos 8 versijos

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Constrained update projection approach to safe policy optimization

L Yang, J Ji, J Dai, L Zhang, B Zhou… - Advances in …, 2022 - proceedings.neurips.cc

Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only
maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a …

Išsaugoti Cituoti Cituoja 58 Susiję straipsniai Visos 10 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] github.io

An off-policy trust region policy optimization method with monotonic improvement guarantee for deep reinforcement learning

W Meng, Q Zheng, Y Shi, G Pan - IEEE Transactions on Neural …, 2021 - ieeexplore.ieee.org

In deep reinforcement learning, off-policy data help reduce on-policy interaction with the
environment, and the trust region policy optimization (TRPO) method is efficient to stabilize …

Išsaugoti Cituoti Cituoja 55 Susiję straipsniai Visos 4 versijos

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Qualitative case-based reasoning and learning

TPD Homem, PE Santos, AHR Costa… - Artificial Intelligence, 2020 - Elsevier

The development of autonomous agents that perform tasks with the same dexterity as
performed by humans is one of the challenges of artificial intelligence and robotics. This …

Išsaugoti Cituoti Cituoja 62 Susiję straipsniai Visos 14 versijos

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer

In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

Išsaugoti Cituoti Cituoja 37 Susiję straipsniai Visos 12 versijos

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Policy optimization with stochastic mirror descent

L Yang, Y Zhang, G Zheng, Q Zheng, P Li… - Proceedings of the …, 2022 - ojs.aaai.org

Improving sample efficiency has been a longstanding goal in reinforcement learning. This
paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic …

Išsaugoti Cituoti Cituoja 39 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cup: A conservative update policy algorithm for safe reinforcement learning

L Yang, J Ji, J Dai, Y Zhang, P Li, G Pan - arxiv preprint arxiv:2202.07565, 2022 - arxiv.org

Safe reinforcement learning (RL) is still very challenging since it requires the agent to
consider both return maximization and safe exploration. In this paper, we propose CUP, a …

Išsaugoti Cituoti Cituoja 21 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Sample complexity of policy gradient finding second-order stationary points

L Yang, Q Zheng, G Pan - Proceedings of the AAAI Conference on …, 2021 - ojs.aaai.org

The policy-based reinforcement learning (RL) can be considered as maximization of its
objective. However, due to the inherent non-concavity of its objective, the policy gradient …

Išsaugoti Cituoti Cituoja 27 Susiję straipsniai Visos 6 versijos HTML kopija

Variance aware reward smoothing for deep reinforcement learning

Y Dong, S Zhang, X Liu, Y Zhang, T Shen - Neurocomputing, 2021 - Elsevier

Abstract A Reinforcement Learning (RL) agent interacts with the environment to learn a
policy with high accumulated rewards through attempts and failures. However, RL suffers …

Išsaugoti Cituoti Cituoja 21 Susiję straipsniai Visos 2 versijos

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

A unified approach for multi-step temporal-difference learning with eligibility traces in...

A review of safe reinforcement learning: Methods, theory and applications

A review of safe reinforcement learning: Methods, theories and applications

Constrained update projection approach to safe policy optimization

An off-policy trust region policy optimization method with monotonic improvement guarantee for deep reinforcement learning

[HTML][HTML] Qualitative case-based reasoning and learning

Importance sampling in reinforcement learning with an estimated behavior policy

Policy optimization with stochastic mirror descent

Cup: A conservative update policy algorithm for safe reinforcement learning

Sample complexity of policy gradient finding second-order stationary points

Variance aware reward smoothing for deep reinforcement learning