[HTML][HTML] Adaptive pessimism via target Q-value for offline reinforcement learning
Offline reinforcement learning (RL) methods learn from datasets without further environment
interaction, facing errors due to out-of-distribution (OOD) actions. Although effective methods …
interaction, facing errors due to out-of-distribution (OOD) actions. Although effective methods …
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of
offline pretrained policy using only a few online samples. Built on offline RL algorithms, most …
offline pretrained policy using only a few online samples. Built on offline RL algorithms, most …
Enhancing Decision-Making in Offline Reinforcement Learning: Adaptive, Multi-Agent, and Online Perspectives
Y Zhang - 2024 - ses.library.usyd.edu.au
Inspired by the successful application of large models in natural language processing and
computer vision, both the research community and industry have increasingly focused on …
computer vision, both the research community and industry have increasingly focused on …