- Academic Search

F Leibfried, V Dutordoir, ST John… - arxiv preprint arxiv …, 2020 - arxiv.org

Gaussian processes (GPs) provide a framework for Bayesian inference that can offer
principled uncertainty estimates for a large range of problems. For example, if we consider …

保存引用被引用数: 57 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Bridging the gap between value and policy based reinforcement learning

O Nachum, M Norouzi, K Xu… - Advances in neural …, 2017 - proceedings.neurips.cc

We establish a new connection between value and policy based reinforcement learning
(RL) based on a relationship between softmax temporal value consistency and policy …

保存引用被引用数: 558 関連記事全 14 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arxiv preprint arxiv:1705.07798, 2017 - arxiv.org

We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

保存引用被引用数: 292 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

Y Tsurumine, Y Cui, E Uchibe, T Matsubara - Robotics and Autonomous …, 2019 - Elsevier

Abstract Deep Reinforcement Learning (DRL), which can learn complex policies with high-
dimensional observations as inputs, eg, images, has been successfully applied to various …

保存引用被引用数: 197 関連記事全 4 バージョン

[Free GPT-4]

[PDF] tu-berlin.de

On stochastic optimal control and reinforcement learning by approximate inference

K Rawlik, M Toussaint, S Vijayakumar - 2013 - direct.mit.edu

We present a reformulation of the stochastic optimal control problem in terms of KL
divergence minimisation, not only providing a unifying perspective of previous approaches …

保存引用被引用数: 380 関連記事全 21 バージョン

[Free GPT-4]

[PDF] mlr.press

Cautiously optimistic policy optimization and exploration with linear function approximation

A Zanette, CA Cheng… - Conference on Learning …, 2021 - proceedings.mlr.press

Policy optimization methods are popular reinforcement learning (RL) algorithms, because
their incremental and on-policy nature makes them more stable than the value-based …

保存引用被引用数: 62 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Trust-pcl: An off-policy trust region method for continuous control

O Nachum, M Norouzi, K Xu, D Schuurmans - arxiv preprint arxiv …, 2017 - arxiv.org

Trust region methods, such as TRPO, are often used to stabilize policy optimization
algorithms in reinforcement learning (RL). While current trust region strategies are effective …

保存引用被引用数: 134 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Scalable reinforcement learning for plant-wide control of vinyl acetate monomer process

L Zhu, Y Cui, G Takami, H Kanokogi… - Control Engineering …, 2020 - Elsevier

This paper explores a reinforcement learning (RL) approach that designs automatic control
strategies in a large-scale chemical process control scenario as the first step for leveraging …

保存引用被引用数: 67 関連記事全 4 バージョン

[Free GPT-4]

[PDF] arxiv.org

Goal-aware generative adversarial imitation learning from imperfect demonstration for robotic cloth manipulation

Y Tsurumine, T Matsubara - Robotics and Autonomous Systems, 2022 - Elsevier

Abstract Generative Adversarial Imitation Learning (GAIL) can learn policies without
explicitly defining the reward function from demonstrations. GAIL has the potential to learn …

保存引用被引用数: 12 関連記事全 6 バージョン

[Free GPT-4]

[PDF] neurips.cc

A unified bellman optimality principle combining reward maximization and empowerment

F Leibfried, S Pascual-Diaz… - Advances in Neural …, 2019 - proceedings.neurips.cc

Empowerment is an information-theoretic method that can be used to intrinsically motivate
learning agents. It attempts to maximize an agent's control over the environment by …

保存引用被引用数: 40 関連記事全 6 バージョン HTMLバージョン

引用

検索オプション

マイライブラリに保存しました

A tutorial on sparse Gaussian processes and variational inference

Bridging the gap between value and policy based reinforcement learning

A unified view of entropy-regularized markov decision processes

[HTML][HTML] Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

On stochastic optimal control and reinforcement learning by approximate inference

Cautiously optimistic policy optimization and exploration with linear function approximation

Trust-pcl: An off-policy trust region method for continuous control

[HTML][HTML] Scalable reinforcement learning for plant-wide control of vinyl acetate monomer process

Goal-aware generative adversarial imitation learning from imperfect demonstration for robotic cloth manipulation

A unified bellman optimality principle combining reward maximization and empowerment