Recent advances in reinforcement learning in finance
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …
revolutionized the techniques on data processing and data analysis and brought new …
Guarantees for epsilon-greedy reinforcement learning with function approximation
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …
Motif: Intrinsic motivation from artificial intelligence feedback
Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …
immensely challenging. In this paper, we propose Motif, a general method to interface such …
On the importance of exploration for generalization in reinforcement learning
Existing approaches for improving generalization in deep reinforcement learning (RL) have
mostly focused on representation learning, neglecting RL-specific aspects such as …
mostly focused on representation learning, neglecting RL-specific aspects such as …
Reinforcement learning: An overview
K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …
learning and sequential decision making, covering value-based RL, policy-gradient …
Temporal abstraction in reinforcement learning with the successor representation
Reasoning at multiple levels of temporal abstraction is one of the key attributes of
intelligence. In reinforcement learning, this is often modeled through temporally extended …
intelligence. In reinforcement learning, this is often modeled through temporally extended …
Deep laplacian-based options for temporally-extended exploration
Selecting exploratory actions that generate a rich stream of experience for better learning is
a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem …
a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem …
UAV path planning optimization strategy: Considerations of urban morphology, microclimate, and energy efficiency using Q-learning algorithm
The use of unmanned aerial vehicles (UAVS) has been suggested as a potential
communications alternative due to their fast implantation, which makes this resource an …
communications alternative due to their fast implantation, which makes this resource an …
The phenomenon of policy churn
We identify and study the phenomenon of policy churn, that is, the rapid change of the
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …
Timing as an Action: Learning When to Observe and Act
In standard reinforcement learning setups, the agent receives observations and performs
actions at evenly spaced intervals. However, in many real-world settings, observations are …
actions at evenly spaced intervals. However, in many real-world settings, observations are …