A simple unified uncertainty-guided framework for offline-to-online reinforcement learning

S Guo, Y Sun, J Hu, S Huang, H Chen, H Piao… - arxiv preprint arxiv …, 2023 - arxiv.org
Offline reinforcement learning (RL) provides a promising solution to learning an agent fully
relying on a data-driven paradigm. However, constrained by the limited quality of the offline …

[HTML][HTML] Selective imitation for efficient online reinforcement learning with pre-collected data

C Eom, D Lee, M Kwon - ICT Express, 2024 - Elsevier
Deep reinforcement learning (RL) has emerged as a promising solution for autonomous
devices requiring sequential decision-making. In the online RL framework, the agent must …

Temporal logic specification-conditioned decision transformer for offline safe reinforcement learning

Z Guo, W Zhou, W Li - arxiv preprint arxiv:2402.17217, 2024 - arxiv.org
Offline safe reinforcement learning (RL) aims to train a constraint satisfaction policy from a
fixed dataset. Current state-of-the-art approaches are based on supervised learning with a …

Adversarial Conservative Alternating Q-Learning for Credit Card Debt Collection

W Liu, J Zhu, L Ni, J Bi, Z Wu, J Long… - … on Knowledge and …, 2025 - ieeexplore.ieee.org
Debt collection is utilized for risk control after credit card delinquency. The existing rule-
based method tends to be myopic and non-adaptive due to the delayed feedback …

An offline-to-online reinforcement learning approach based on multi-action evaluation with policy extension

X Cheng, X Huang, Z Huang, N Jiang - Applied Intelligence, 2024 - Springer
Abstract Offline Reinforcement Learning (Offline RL) is able to learn from pre-collected
offline data without real-time interaction with the environment by policy regularization via …

Transformer-based reinforcement learning for optical cavity temperature control system

H Zhang, Y Lu, C Wang, W Dou, S Liu, C Huang… - Applied …, 2025 - Springer
The accuracy of laser gas detection technology is influenced by the temperature of the
optical cavity. Traditional control methods suffer from inadequacies in fully considering the …

Goal-Conditioned Data Augmentation for Offline Reinforcement Learning

X Huang, DW Member, B Boulet - arxiv preprint arxiv:2412.20519, 2024 - arxiv.org
Offline reinforcement learning (RL) enables policy learning from pre-collected offline
datasets, relaxing the need to interact directly with the environment. However, limited by the …

Towards online training for RL-based query optimizer

M Ramadan, HMO Mokhtar, I Sobh… - International Journal of …, 2024 - Springer
Join query optimization aims to find the best join order for tables in a query, which is critical
for query processing performance. Recently, reinforcement learning models have been …

DRDT3: Diffusion-Refined Decision Test-Time Training Model

X Huang, D Wu, B Boulet - arxiv preprint arxiv:2501.06718, 2025 - arxiv.org
Decision Transformer (DT), a trajectory modeling method, has shown competitive
performance compared to traditional offline reinforcement learning (RL) approaches on …

Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs

X Tang, J Li, N Du, S **e - arxiv preprint arxiv:2412.07618, 2024 - arxiv.org
Despite the superior performance of Large language models on many NLP tasks, they still
face significant limitations in memorizing extensive world knowledge. Recent studies have …