Discovering reinforcement learning algorithms

J Oh, M Hessel, WM Czarnecki, Z Xu… - Advances in …, 2020 - proceedings.neurips.cc
Reinforcement learning (RL) algorithms update an agent's parameters according to one of
several possible rules, discovered manually through years of research. Automating the …

Meta-gradient reinforcement learning with an objective discovered online

Z Xu, HP van Hasselt, M Hessel, J Oh… - Advances in …, 2020 - proceedings.neurips.cc
Deep reinforcement learning includes a broad family of algorithms that parameterise an
internal representation, such as a value function or policy, by a deep neural network. Each …

Behavior alignment via reward function optimization

D Gupta, Y Chandak, S Jordan… - Advances in …, 2024 - proceedings.neurips.cc
Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward
specific behaviors is a complex task. This is challenging since it requires the identification of …

Applications of Reinforcement Learning in Finance--Trading with a Double Deep Q-Network

F Zejnullahu, M Moser, J Osterrieder - arxiv preprint arxiv:2206.14267, 2022 - arxiv.org
This paper presents a Double Deep Q-Network algorithm for trading single assets, namely
the E-mini S&P 500 continuous futures contract. We use a proven setup as the foundation for …

Discounted-sum automata with multiple discount factors

U Boker, G Hefetz - arxiv preprint arxiv:2307.08780, 2023 - arxiv.org
Discounting the influence of future events is a key paradigm in economics and it is widely
used in computer-science models, such as games, Markov decision processes (MDPs) …

Distributional meta-gradient reinforcement learning

H Yin, YAN Shuicheng, Z Xu - The Eleventh International …, 2023 - openreview.net
Meta-gradient reinforcement learning (RL) algorithms have substantially boosted the
performance of RL agents by learning an adaptive return. All the existing algorithms adhere …

Adaptive pairwise weights for temporal credit assignment

Z Zheng, R Vuorio, R Lewis, S Singh - Proceedings of the AAAI …, 2022 - ojs.aaai.org
How much credit (or blame) should an action taken in a state get for a future reward? This is
the fundamental temporal credit assignment problem in Reinforcement Learning (RL). One …

Optimism and Adaptivity in Policy Optimization

V Chelu, T Zahavy, A Guez, D Precup… - arxiv preprint arxiv …, 2023 - arxiv.org
We work towards a unifying paradigm for accelerating policy optimization methods in
reinforcement learning (RL) through\emph {optimism}\&\emph {adaptivity}. Leveraging the …

Advances in Deep Reinforcement Learning: Intrinsic Rewards, Temporal Credit Assignment, State Representations, and Value-equivalent Models

Z Zheng - 2022 - deepblue.lib.umich.edu
Reinforcement learning (RL) is a machine learning paradigm concerned with how an agent
learns to predict and control its own experience stream so as to maximize long-term …

Acceleration in Policy Optimization

V Chelu, T Zahavy, A Guez, D Precup… - … European Workshop on … - openreview.net
We work towards a unifying paradigm for accelerating policy optimization methods in
reinforcement learning (RL) through predictive and adaptive directions of (functional) policy …