Maximum entropy exploration in contextual bandits with neural networks and energy based models

A Elwood, M Leonardi, A Mohamed, A Rozza - Entropy, 2023 - mdpi.com
Contextual bandits can solve a huge range of real-world problems. However, current
popular algorithms to solve them either rely on linear models or unreliable uncertainty …

Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning

J Lee, J Heo, D Kim, G Lee, S Oh - 2023 IEEE/RSJ International …, 2023 - ieeexplore.ieee.org
Satisfying safety constraints in reinforcement learning (RL) is an important issue, especially
in real-world applications. Many studies have approached safe RL with the Lagrangian …