Recent advances in reinforcement learning in finance
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …
revolutionized the techniques on data processing and data analysis and brought new …
Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …
data without active exploration of the environment. To counter the insufficient coverage and …
The curious price of distributional robustness in reinforcement learning with a generative model
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
Settling the sample complexity of model-based offline reinforcement learning
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Almost optimal model-free reinforcement learningvia reference-advantage decomposition
We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …
Is reinforcement learning more difficult than bandits? a near-optimal algorithm esca** the curse of horizon
Episodic reinforcement learning and contextual bandits are two widely studied sequential
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …
Logarithmic regret for reinforcement learning with linear function approximation
Reinforcement learning (RL) with linear function approximation has received increasing
attention recently. However, existing work has focused on obtaining $\sqrt {T} $-type regret …
attention recently. However, existing work has focused on obtaining $\sqrt {T} $-type regret …
Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium
Q **
Modern tasks in reinforcement learning have large state and action spaces. To deal with
them efficiently, one often uses predefined feature map** to represent states and actions …
them efficiently, one often uses predefined feature map** to represent states and actions …
Learning adversarial markov decision processes with bandit feedback and unknown transition
We consider the task of learning in episodic finite-horizon Markov decision processes with
an unknown transition function, bandit feedback, and adversarial losses. We propose an …
an unknown transition function, bandit feedback, and adversarial losses. We propose an …