An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

[LIBRO][B] A concise introduction to decentralized POMDPs

FA Oliehoek, C Amato - 2016 - Springer
This book presents an overview of formal decision making methods for decentralized
cooperative systems. It is aimed at graduate students and researchers in the fields of …

[LIBRO][B] Algorithms for reinforcement learning

C Szepesvári - 2022 - books.google.com
Reinforcement learning is a learning paradigm concerned with learning to control a system
so as to maximize a numerical performance measure that expresses a long-term objective …

Active inference and agency: optimal control without cost functions

K Friston, S Samothrakis, R Montague - Biological cybernetics, 2012 - Springer
This paper describes a variational free-energy formulation of (partially observable) Markov
decision problems in decision making under uncertainty. We show that optimal control can …

Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates.

A Solway, MM Botvinick - Psychological review, 2012 - psycnet.apa.org
Recent work has given rise to the view that reward-based decision making is governed by
two key controllers: a habit system, which stores stimulus–response associations shaped by …

Variational policy search via trajectory optimization

S Levine, V Koltun - Advances in neural information …, 2013 - proceedings.neurips.cc
In order to learn effective control policies for dynamical systems, policy search methods must
be able to discover successful executions of the desired task. While random exploration can …

Learning deep neural network policies with continuous memory states

M Zhang, Z McCarthy, C Finn, S Levine… - … on robotics and …, 2016 - ieeexplore.ieee.org
Policy learning for partially observed control tasks requires policies that can remember
salient information from past observations. In this paper, we present a method for learning …

Program synthesis guided reinforcement learning for partially observed environments

Y Yang, JP Inala, O Bastani, Y Pu… - Advances in neural …, 2021 - proceedings.neurips.cc
A key challenge for reinforcement learning is solving long-horizon planning problems.
Recent work has leveraged programs to guide reinforcement learning in these settings …

[PDF][PDF] Probabilistic inference as a model of planned behavior.

M Toussaint - Künstliche Intell., 2009 - Citeseer
The problem of planning and goal-directed behavior has been addressed in computer
science for many years, typically based on classical concepts like Bellman's optimality …

PUMA: Planning under uncertainty with macro-actions

R He, E Brunskill, N Roy - Proceedings of the AAAI Conference on …, 2010 - ojs.aaai.org
Planning in large, partially observable domains is challenging, especially when a long-
horizon lookahead is necessary to obtain a good policy. Traditional POMDP planners that …