[HTML][HTML] Adaptive pessimism via target Q-value for offline reinforcement learning

J Liu, Y Zhang, C Li, Y Yang, Y Liu, W Ouyang - Neural Networks, 2024 - Elsevier
Offline reinforcement learning (RL) methods learn from datasets without further environment
interaction, facing errors due to out-of-distribution (OOD) actions. Although effective methods …

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

Y Zhang, J Liu, C Li, Y Niu, Y Yang, Y Liu… - Proceedings of the …, 2024 - ojs.aaai.org
Offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of
offline pretrained policy using only a few online samples. Built on offline RL algorithms, most …

Enhancing Decision-Making in Offline Reinforcement Learning: Adaptive, Multi-Agent, and Online Perspectives

Y Zhang - 2024 - ses.library.usyd.edu.au
Inspired by the successful application of large models in natural language processing and
computer vision, both the research community and industry have increasingly focused on …