Information-directed pessimism for offline reinforcement learning
Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …
collecting data from a current policy is not possible. This setting incurs distribution mismatch …
Asymmetric actor-critic with approximate information state
Reinforcement learning (RL) for partially observable Markov decision processes (POMDPs)
is a challenging problem because decisions need to be made based on the entire history of …
is a challenging problem because decisions need to be made based on the entire history of …
Decentralized Learning of Finite-Memory Policies in Dec-POMDPs
Multi-agent reinforcement learning (MARL) under partial observability is notoriously
challenging as the agents only have asymmetric partial observations of the system. In this …
challenging as the agents only have asymmetric partial observations of the system. In this …
Multi-agent reinforcement learning for nonzero-sum Markov games
W Mao - 2024 - ideals.illinois.edu
In recent years, multi-agent reinforcement learning (MARL) has shown remarkable
capabilities in addressing sequential decision-making problems that involve the strategic …
capabilities in addressing sequential decision-making problems that involve the strategic …