Constrained update projection approach to safe policy optimization

L Yang, J Ji, J Dai, L Zhang, B Zhou… - Advances in …, 2022 - proceedings.neurips.cc
Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only
maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a …

A general sample complexity analysis of vanilla policy gradient

R Yuan, RM Gower, A Lazaric - International Conference on …, 2022 - proceedings.mlr.press
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in
non-convex optimization to obtain convergence and sample complexity guarantees for the …

Sample efficient policy gradient methods with recursive variance reduction

P Xu, F Gao, Q Gu - arxiv preprint arxiv:1909.08610, 2019 - arxiv.org
Improving the sample efficiency in reinforcement learning has been a long-standing
research problem. In this work, we aim to reduce the sample complexity of existing policy …

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

Off-policy proximal policy optimization

W Meng, Q Zheng, G Pan, Y Yin - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Abstract Proximal Policy Optimization (PPO) is an important reinforcement learning method,
which has achieved great success in sequential decision-making problems. However, PPO …

Wastewater treatment monitoring: Fault detection in sensors using transductive learning and improved reinforcement learning

J Yang, K Tian, H Zhao, Z Feng, S Bourouis… - Expert Systems with …, 2025 - Elsevier
Wastewater treatment plants (WWTPs) increasingly utilize sensors to optimize operations
and ensure treated water quality. These sensors' rich datasets are well-suited for automated …

Stock market prediction with transductive long short-term memory and social media sentiment analysis

A Peivandizadeh, S Hatami, A Nakhjavani… - IEEE …, 2024 - ieeexplore.ieee.org
In an era dominated by digital communication, the vast amounts of data generated from
social media and financial markets present unique opportunities and challenges for …

Seismonet: A proximal policy optimization-based earthquake early warning system using dilated convolution layers and online data augmentation

S Banar, R Mohammadi - Expert Systems with Applications, 2024 - Elsevier
Abstract In seismic safety, Earthquake Early Warning (EEW) systems are indispensable for
mitigating earthquake hazards. These systems strive to quickly evaluate earthquake …