[HTML][HTML] The free energy principle made simpler but not too simple
This paper provides a concise description of the free energy principle, starting from a
formulation of random dynamical systems in terms of a Langevin equation and ending with a …
formulation of random dynamical systems in terms of a Langevin equation and ending with a …
Deep reinforcement learning: A brief survey
Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence
(AI) and represents a step toward building autonomous systems with a higher-level …
(AI) and represents a step toward building autonomous systems with a higher-level …
Soft actor-critic algorithms and applications
Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a
range of challenging sequential decision making and control tasks. However, these methods …
range of challenging sequential decision making and control tasks. However, these methods …
A brief survey of deep reinforcement learning
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step
towards building autonomous systems with a higher level understanding of the visual world …
towards building autonomous systems with a higher level understanding of the visual world …
Maximum entropy RL (provably) solves some robust RL problems
Many potential applications of reinforcement learning (RL) require guarantees that the agent
will perform well in the face of disturbances to the dynamics or reward function. In this paper …
will perform well in the face of disturbances to the dynamics or reward function. In this paper …
Reinforcement learning with deep energy-based policies
We propose a method for learning expressive energy-based policies for continuous states
and actions, which has been feasible only in tabular domains before. We apply our method …
and actions, which has been feasible only in tabular domains before. We apply our method …
Maximum a posteriori policy optimisation
We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy
Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show …
Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show …
Bridging the gap between value and policy based reinforcement learning
We establish a new connection between value and policy based reinforcement learning
(RL) based on a relationship between softmax temporal value consistency and policy …
(RL) based on a relationship between softmax temporal value consistency and policy …
[HTML][HTML] Path integrals, particular kinds, and strange things
This paper describes a path integral formulation of the free energy principle. The ensuing
account expresses the paths or trajectories that a particle takes as it evolves over time. The …
account expresses the paths or trajectories that a particle takes as it evolves over time. The …
Learning to be safe: Deep rl with a safety critic
Safety is an essential component for deploying reinforcement learning (RL) algorithms in
real-world scenarios, and is critical during the learning process itself. A natural first approach …
real-world scenarios, and is critical during the learning process itself. A natural first approach …