Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Guarantees for epsilon-greedy reinforcement learning with function approximation

C Dann, Y Mansour, M Mohri… - International …, 2022 - proceedings.mlr.press
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …

Motif: Intrinsic motivation from artificial intelligence feedback

M Klissarov, P D'Oro, S Sodhani, R Raileanu… - arxiv preprint arxiv …, 2023 - arxiv.org
Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …

On the importance of exploration for generalization in reinforcement learning

Y Jiang, JZ Kolter, R Raileanu - Advances in Neural …, 2024 - proceedings.neurips.cc
Existing approaches for improving generalization in deep reinforcement learning (RL) have
mostly focused on representation learning, neglecting RL-specific aspects such as …

Reinforcement learning: An overview

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

Temporal abstraction in reinforcement learning with the successor representation

MC Machado, A Barreto, D Precup… - Journal of Machine …, 2023 - jmlr.org
Reasoning at multiple levels of temporal abstraction is one of the key attributes of
intelligence. In reinforcement learning, this is often modeled through temporally extended …

Deep laplacian-based options for temporally-extended exploration

M Klissarov, MC Machado - arxiv preprint arxiv:2301.11181, 2023 - arxiv.org
Selecting exploratory actions that generate a rich stream of experience for better learning is
a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem …

UAV path planning optimization strategy: Considerations of urban morphology, microclimate, and energy efficiency using Q-learning algorithm

A Souto, R Alfaia, E Cardoso, J Araújo, C Francês - Drones, 2023 - mdpi.com
The use of unmanned aerial vehicles (UAVS) has been suggested as a potential
communications alternative due to their fast implantation, which makes this resource an …

The phenomenon of policy churn

T Schaul, A Barreto, J Quan… - Advances in Neural …, 2022 - proceedings.neurips.cc
We identify and study the phenomenon of policy churn, that is, the rapid change of the
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …

Timing as an Action: Learning When to Observe and Act

H Zhou, A Huang, K Azizzadenesheli… - International …, 2024 - proceedings.mlr.press
In standard reinforcement learning setups, the agent receives observations and performs
actions at evenly spaced intervals. However, in many real-world settings, observations are …