- Academic Search

J Oh, M Hessel, WM Czarnecki, Z Xu… - Advances in …, 2020 - proceedings.neurips.cc

Reinforcement learning (RL) algorithms update an agent's parameters according to one of
several possible rules, discovered manually through years of research. Automating the …

保存引用被引用数: 169 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Meta-gradient reinforcement learning with an objective discovered online

Z Xu, HP van Hasselt, M Hessel, J Oh… - Advances in …, 2020 - proceedings.neurips.cc

Deep reinforcement learning includes a broad family of algorithms that parameterise an
internal representation, such as a value function or policy, by a deep neural network. Each …

保存引用被引用数: 85 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Behavior alignment via reward function optimization

D Gupta, Y Chandak, S Jordan… - Advances in …, 2024 - proceedings.neurips.cc

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward
specific behaviors is a complex task. This is challenging since it requires the identification of …

保存引用被引用数: 11 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Applications of Reinforcement Learning in Finance--Trading with a Double Deep Q-Network

F Zejnullahu, M Moser, J Osterrieder - arxiv preprint arxiv:2206.14267, 2022 - arxiv.org

This paper presents a Double Deep Q-Network algorithm for trading single assets, namely
the E-mini S&P 500 continuous futures contract. We use a proven setup as the foundation for …

保存引用被引用数: 8 関連記事全 11 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Discounted-sum automata with multiple discount factors

U Boker, G Hefetz - arxiv preprint arxiv:2307.08780, 2023 - arxiv.org

Discounting the influence of future events is a key paradigm in economics and it is widely
used in computer-science models, such as games, Markov decision processes (MDPs) …

保存引用被引用数: 8 関連記事全 10 バージョン HTMLバージョン

[Free GPT-4]

[PDF] openreview.net

Distributional meta-gradient reinforcement learning

H Yin, YAN Shuicheng, Z Xu - The Eleventh International …, 2023 - openreview.net

Meta-gradient reinforcement learning (RL) algorithms have substantially boosted the
performance of RL agents by learning an adaptive return. All the existing algorithms adhere …

保存引用被引用数: 1 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] aaai.org

Adaptive pairwise weights for temporal credit assignment

Z Zheng, R Vuorio, R Lewis, S Singh - Proceedings of the AAAI …, 2022 - ojs.aaai.org

How much credit (or blame) should an action taken in a state get for a future reward? This is
the fundamental temporal credit assignment problem in Reinforcement Learning (RL). One …

保存引用被引用数: 5 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Optimism and Adaptivity in Policy Optimization

V Chelu, T Zahavy, A Guez, D Precup… - arxiv preprint arxiv …, 2023 - arxiv.org

We work towards a unifying paradigm for accelerating policy optimization methods in
reinforcement learning (RL) through\emph {optimism}\&\emph {adaptivity}. Leveraging the …

保存引用関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] umich.edu

Advances in Deep Reinforcement Learning: Intrinsic Rewards, Temporal Credit Assignment, State Representations, and Value-equivalent Models

Z Zheng - 2022 - deepblue.lib.umich.edu

Reinforcement learning (RL) is a machine learning paradigm concerned with how an agent
learns to predict and control its own experience stream so as to maximize long-term …

保存引用被引用数: 1 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] openreview.net

Acceleration in Policy Optimization

V Chelu, T Zahavy, A Guez, D Precup… - … European Workshop on … - openreview.net

We work towards a unifying paradigm for accelerating policy optimization methods in
reinforcement learning (RL) through predictive and adaptive directions of (functional) policy …

保存引用関連記事全 2 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Beyond exponentially discounted sum: Automatic learning of return function

Discovering reinforcement learning algorithms

Meta-gradient reinforcement learning with an objective discovered online

Behavior alignment via reward function optimization

Applications of Reinforcement Learning in Finance--Trading with a Double Deep Q-Network

Discounted-sum automata with multiple discount factors

Distributional meta-gradient reinforcement learning

Adaptive pairwise weights for temporal credit assignment

Optimism and Adaptivity in Policy Optimization

Advances in Deep Reinforcement Learning: Intrinsic Rewards, Temporal Credit Assignment, State Representations, and Value-equivalent Models

Acceleration in Policy Optimization