Maximum entropy exploration in contextual bandits with neural networks and energy based models
Contextual bandits can solve a huge range of real-world problems. However, current
popular algorithms to solve them either rely on linear models or unreliable uncertainty …
popular algorithms to solve them either rely on linear models or unreliable uncertainty …
Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning
Satisfying safety constraints in reinforcement learning (RL) is an important issue, especially
in real-world applications. Many studies have approached safe RL with the Lagrangian …
in real-world applications. Many studies have approached safe RL with the Lagrangian …