Dual Behavior Regularized Offline Deterministic Actor–Critic

S Cao, X Wang, Y Cheng - IEEE Transactions on Systems …, 2024 - ieeexplore.ieee.org
To mitigate the extrapolation error arising from offline reinforcement learning (RL) paradigm,
existing methods typically make learned Q-functions over-conservative or enforce global …

Visionary Policy Iteration for Continuous Control

B Dong, L Huang, X Ma, H Chen… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
In this article, a novel visionary policy iteration (VPI) framework is proposed to address the
continuous-action reinforcement learning (RL) tasks. In VPI, a visionary Q-function is …

Adversarial Conservative Alternating Q-Learning for Credit Card Debt Collection

W Liu, J Zhu, L Ni, J Bi, Z Wu, J Long… - … on Knowledge and …, 2025 - ieeexplore.ieee.org
Debt collection is utilized for risk control after credit card delinquency. The existing rule-
based method tends to be myopic and non-adaptive due to the delayed feedback …

Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

J Xu, R Yang, F Luo, M Fang, B Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Learning policies from offline datasets through offline reinforcement learning (RL) holds
promise for scaling data-driven decision-making and avoiding unsafe and costly online …

[HTML][HTML] A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization

Y Li, Y Zhang - Biomimetics, 2024 - mdpi.com
The nutcracker optimizer algorithm (NOA) is a metaheuristic method proposed in recent
years. This algorithm simulates the behavior of nutcrackers searching and storing food in …

Diffusion Actor with Behavior Critic Guidance Algorithm for Offline Reinforcement Learning

B Dong, L Huang, N Pang, R Liu… - 2024 7th International …, 2024 - ieeexplore.ieee.org
To address the multimodal nature of offline dataset distributions and the overestimation
problem associated with out-of-distribution (OOD) actions, this paper introduces the diffusion …