[HTML][HTML] The free energy principle made simpler but not too simple

K Friston, L Da Costa, N Sajid, C Heins, K Ueltzhöffer… - Physics Reports, 2023 - Elsevier
This paper provides a concise description of the free energy principle, starting from a
formulation of random dynamical systems in terms of a Langevin equation and ending with a …

Deep reinforcement learning: A brief survey

K Arulkumaran, MP Deisenroth… - IEEE Signal …, 2017 - ieeexplore.ieee.org
Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence
(AI) and represents a step toward building autonomous systems with a higher-level …

Soft actor-critic algorithms and applications

T Haarnoja, A Zhou, K Hartikainen, G Tucker… - arxiv preprint arxiv …, 2018 - arxiv.org
Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a
range of challenging sequential decision making and control tasks. However, these methods …

A brief survey of deep reinforcement learning

K Arulkumaran, MP Deisenroth, M Brundage… - arxiv preprint arxiv …, 2017 - arxiv.org
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step
towards building autonomous systems with a higher level understanding of the visual world …

Maximum entropy RL (provably) solves some robust RL problems

B Eysenbach, S Levine - arxiv preprint arxiv:2103.06257, 2021 - arxiv.org
Many potential applications of reinforcement learning (RL) require guarantees that the agent
will perform well in the face of disturbances to the dynamics or reward function. In this paper …

Reinforcement learning with deep energy-based policies

T Haarnoja, H Tang, P Abbeel… - … conference on machine …, 2017 - proceedings.mlr.press
We propose a method for learning expressive energy-based policies for continuous states
and actions, which has been feasible only in tabular domains before. We apply our method …

Maximum a posteriori policy optimisation

A Abdolmaleki, JT Springenberg, Y Tassa… - arxiv preprint arxiv …, 2018 - arxiv.org
We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy
Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show …

Bridging the gap between value and policy based reinforcement learning

O Nachum, M Norouzi, K Xu… - Advances in neural …, 2017 - proceedings.neurips.cc
We establish a new connection between value and policy based reinforcement learning
(RL) based on a relationship between softmax temporal value consistency and policy …

[HTML][HTML] Path integrals, particular kinds, and strange things

K Friston, L Da Costa, DAR Sakthivadivel, C Heins… - Physics of Life …, 2023 - Elsevier
This paper describes a path integral formulation of the free energy principle. The ensuing
account expresses the paths or trajectories that a particle takes as it evolves over time. The …

Learning to be safe: Deep rl with a safety critic

K Srinivasan, B Eysenbach, S Ha, J Tan… - arxiv preprint arxiv …, 2020 - arxiv.org
Safety is an essential component for deploying reinforcement learning (RL) algorithms in
real-world scenarios, and is critical during the learning process itself. A natural first approach …