Offline reinforcement learning with behavior value regularization

L Huang, B Dong, W **e… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Offline reinforcement learning (offline RL) aims to find task-solving policies from prerecorded
datasets without online environment interaction. It is unfortunate that extrapolation errors can …

ACL-QL: Adaptive Conservative Level in -Learning for Offline Reinforcement Learning

K Wu, Y Zhao, Z Xu, Z Che, C Yin… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
Offline reinforcement learning (RL), which operates solely on static datasets without further
interactions with the environment, provides an appealing alternative to learning a safe and …

Domain: Mildly conservative model-based offline reinforcement learning

XY Liu, XH Zhou, MJ Gui, XL **e, SQ Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Model-based reinforcement learning (RL), which learns environment model from offline
dataset and generates more out-of-distribution model data, has become an effective …

A Composite Observer-Based Optimal Attitude Tracking Control for FWEPAUV via Reinforcement Learning

N Pang, B Dong, L Huang, Z Hu… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
The foldable wave-energy powered autonomous underwater vehicle (FWEPAUV) is capable
of directly generating sufficient electrical energy from seawater when its body aligns …

Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation

S Cao, X Wang, Y Cheng - IEEE/CAA Journal of Automatica …, 2024 - ieeexplore.ieee.org
To alleviate the extrapolation error and instability inherent in Q-function directly learned by
off-policy Q-learning (QL-style) on static datasets, this article utilizes the on-policy state …

Proximal Policy Optimization with Advantage Reuse Competition

Y Cheng, Q Guo, X Wang - IEEE Transactions on Artificial …, 2024 - ieeexplore.ieee.org
In recent years, reinforcement learning (RL) has made great achievements in artificial
intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which …

Offline Reinforcement Learning without Regularization and Pessimism

L Huang, B Dong, N Pang, R Liu, W Zhang - Authorea Preprints, 2024 - techrxiv.org
Offline reinforcement learning (RL) learns policies for solving sequential decision problems
directly from offline datasets. Most existing works focus on countering out-of-distribution …