Risk-sensitive reinforcement learning applied to control under constraints

P Geibel, F Wysotzki - Journal of Artificial Intelligence Research, 2005‏ - jair.org
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states
are those states entering which is undesirable or dangerous. We define the risk with respect …

Hyperbolic discounting and learning over multiple horizons

W Fedus, C Gelada, Y Bengio, MG Bellemare… - arxiv preprint arxiv …, 2019‏ - arxiv.org
Reinforcement learning (RL) typically defines a discount factor as part of the Markov
Decision Process. The discount factor values future rewards by an exponential scheme that …

Accelerated primal-dual policy optimization for safe reinforcement learning

Q Liang, F Que, E Modiano - arxiv preprint arxiv:1802.06480, 2018‏ - arxiv.org
Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement
learning tasks with safety constraints, where agents learn a policy that maximizes the long …

Constrained discounted Markov decision processes and Hamiltonian cycles

EA Feinberg - Mathematics of Operations Research, 2000‏ - pubsonline.informs.org
This paper establishes new links between stochastic and discrete optimization. We consider
the following three problems for discrete time Markov Decision Processes with finite states …

[PDF][PDF] Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors

DA Dolgov, EH Durfee - IJCAI, 2005‏ - Citeseer
We consider the problem of policy optimization for a resource-limited agent with multiple
timedependent objectives, represented as an MDP with multiple discount factors in the …

[PDF][PDF] Markov decision processes

L Kallenberg - Lecture Notes. University of Leiden, 2011‏ - researchgate.net
Branching out from operations research roots of the 1950's, Markov decision processes
(MDPs) have gained recognition in such diverse fields as economics, telecommunication …

Constrained reinforcement learning from intrinsic and extrinsic rewards

E Uchibe, K Doya - 2007 IEEE 6th International Conference on …, 2007‏ - ieeexplore.ieee.org
The main objective of a standard reinforcement learner is usually defined as maximization of
a scalar reward function given externally from the environment. On the other hand, an …

Constrained average cost Markov control processes in Borel spaces

O Hernández-Lerma, J González-Hernández… - SIAM Journal on Control …, 2003‏ - SIAM
This paper considers constrained Markov control processes in Borel spaces, with
unbounded costs. The criterion to be minimized is a long-run expected average cost, and …

Controlled random sequences: methods of convex analysis and problems with functional constraints

AB Piunovskii - Russian Mathematical Surveys, 1998‏ - iopscience.iop.org
Abstract Contents Introduction § 1. Controlled random sequences: main definitions and
traditional approaches § 1.1. Description of the mathematical model § 1.2. Models with …

Constrained Markov control processes in Borel spaces: the discounted case

O Hernández-Lerma… - Mathematical Methods of …, 2000‏ - Springer
We consider constrained discounted-cost Markov control processes in Borel spaces, with
unbounded costs. Conditions are given for the constrained problem to be solvable, and also …