Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

A finite-time analysis of two time-scale actor-critic methods

YF Wu, W Zhang, P Xu, Q Gu - Advances in Neural …, 2020 - proceedings.neurips.cc
Actor-critic (AC) methods have exhibited great empirical success compared with other
reinforcement learning algorithms, where the actor uses the policy gradient to improve the …

Provably efficient reinforcement learning for discounted mdps with feature map**

D Zhou, J He, Q Gu - International Conference on Machine …, 2021 - proceedings.mlr.press
Modern tasks in reinforcement learning have large state and action spaces. To deal with
them efficiently, one often uses predefined feature map** to represent states and actions …

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with -Greedy Exploration

S Zhang, H Li, M Wang, M Liu… - Advances in …, 2023 - proceedings.neurips.cc
This paper provides a theoretical understanding of deep Q-Network (DQN) with the
$\varepsilon $-greedy exploration in deep reinforcement learning. Despite the tremendous …

Neural temporal-difference learning converges to global optima

Q Cai, Z Yang, JD Lee, Z Wang - Advances in Neural …, 2019 - proceedings.neurips.cc
Abstract Temporal-difference learning (TD), coupled with neural networks, is among the
most fundamental building blocks of deep reinforcement learning. However, due to the …

Actor-critic reinforcement learning for control with stability guarantee

M Han, L Zhang, J Wang, W Pan - IEEE Robotics and …, 2020 - ieeexplore.ieee.org
Reinforcement Learning (RL) and its integration with deep learning have achieved
impressive performance in various robotic control tasks, ranging from motion planning and …

Improving sample complexity bounds for (natural) actor-critic algorithms

T Xu, Z Wang, Y Liang - Advances in Neural Information …, 2020 - proceedings.neurips.cc
The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement
learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and …

Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games

K Zhang, Z Yang, T Basar - Advances in Neural Information …, 2019 - proceedings.neurips.cc
We study the global convergence of policy optimization for finding the Nash equilibria (NE)
in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of …