Discovering reinforcement learning algorithms
Reinforcement learning (RL) algorithms update an agent's parameters according to one of
several possible rules, discovered manually through years of research. Automating the …
several possible rules, discovered manually through years of research. Automating the …
Meta-gradient reinforcement learning with an objective discovered online
Deep reinforcement learning includes a broad family of algorithms that parameterise an
internal representation, such as a value function or policy, by a deep neural network. Each …
internal representation, such as a value function or policy, by a deep neural network. Each …
Behavior alignment via reward function optimization
Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward
specific behaviors is a complex task. This is challenging since it requires the identification of …
specific behaviors is a complex task. This is challenging since it requires the identification of …
Applications of Reinforcement Learning in Finance--Trading with a Double Deep Q-Network
F Zejnullahu, M Moser, J Osterrieder - arxiv preprint arxiv:2206.14267, 2022 - arxiv.org
This paper presents a Double Deep Q-Network algorithm for trading single assets, namely
the E-mini S&P 500 continuous futures contract. We use a proven setup as the foundation for …
the E-mini S&P 500 continuous futures contract. We use a proven setup as the foundation for …
Discounted-sum automata with multiple discount factors
U Boker, G Hefetz - arxiv preprint arxiv:2307.08780, 2023 - arxiv.org
Discounting the influence of future events is a key paradigm in economics and it is widely
used in computer-science models, such as games, Markov decision processes (MDPs) …
used in computer-science models, such as games, Markov decision processes (MDPs) …
Distributional meta-gradient reinforcement learning
Meta-gradient reinforcement learning (RL) algorithms have substantially boosted the
performance of RL agents by learning an adaptive return. All the existing algorithms adhere …
performance of RL agents by learning an adaptive return. All the existing algorithms adhere …
Adaptive pairwise weights for temporal credit assignment
How much credit (or blame) should an action taken in a state get for a future reward? This is
the fundamental temporal credit assignment problem in Reinforcement Learning (RL). One …
the fundamental temporal credit assignment problem in Reinforcement Learning (RL). One …
Optimism and Adaptivity in Policy Optimization
We work towards a unifying paradigm for accelerating policy optimization methods in
reinforcement learning (RL) through\emph {optimism}\&\emph {adaptivity}. Leveraging the …
reinforcement learning (RL) through\emph {optimism}\&\emph {adaptivity}. Leveraging the …
Advances in Deep Reinforcement Learning: Intrinsic Rewards, Temporal Credit Assignment, State Representations, and Value-equivalent Models
Z Zheng - 2022 - deepblue.lib.umich.edu
Reinforcement learning (RL) is a machine learning paradigm concerned with how an agent
learns to predict and control its own experience stream so as to maximize long-term …
learns to predict and control its own experience stream so as to maximize long-term …
Acceleration in Policy Optimization
We work towards a unifying paradigm for accelerating policy optimization methods in
reinforcement learning (RL) through predictive and adaptive directions of (functional) policy …
reinforcement learning (RL) through predictive and adaptive directions of (functional) policy …