[HTML][HTML] Reward is enough
In this article we hypothesise that intelligence, and its associated abilities, can be
understood as subserving the maximisation of reward. Accordingly, reward is enough to …
understood as subserving the maximisation of reward. Accordingly, reward is enough to …
Reinforcement learning for intelligent healthcare applications: A survey
Discovering new treatments and personalizing existing ones is one of the major goals of
modern clinical research. In the last decade, Artificial Intelligence (AI) has enabled the …
modern clinical research. In the last decade, Artificial Intelligence (AI) has enabled the …
A survey of inverse reinforcement learning: Challenges, methods and progress
Inverse reinforcement learning (IRL) is the problem of inferring the reward function of an
agent, given its policy or observed behavior. Analogous to RL, IRL is perceived both as a …
agent, given its policy or observed behavior. Analogous to RL, IRL is perceived both as a …
Reinforcement learning and control as probabilistic inference: Tutorial and review
The framework of reinforcement learning or optimal control provides a mathematical
formalization of intelligent decision making that is powerful and broadly applicable. While …
formalization of intelligent decision making that is powerful and broadly applicable. While …
Generative adversarial imitation learning
Consider learning a policy from example expert behavior, without interaction with the expert
or access to a reinforcement signal. One approach is to recover the expert's cost function …
or access to a reinforcement signal. One approach is to recover the expert's cost function …
Deep reinforcement learning
Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …
decision strategies. However, in many cases, it is desirable to learn directly from …
Sqil: Imitation learning via reinforcement learning with sparse rewards
Learning to imitate expert behavior from demonstrations can be challenging, especially in
environments with high-dimensional, continuous observations and unknown dynamics …
environments with high-dimensional, continuous observations and unknown dynamics …
Chomp: Covariant hamiltonian optimization for motion planning
In this paper, we present CHOMP (covariant Hamiltonian optimization for motion planning),
a method for trajectory optimization invariant to reparametrization. CHOMP uses functional …
a method for trajectory optimization invariant to reparametrization. CHOMP uses functional …
Learning multi-agent behaviors from distributed and streaming demonstrations
This paper considers the problem of inferring the behaviors of multiple interacting experts by
estimating their reward functions and constraints where the distributed demonstrated …
estimating their reward functions and constraints where the distributed demonstrated …
Trajectory forecasts in unknown environments conditioned on grid-based plans
We address the problem of forecasting pedestrian and vehicle trajectories in unknown
environments, conditioned on their past motion and scene structure. Trajectory forecasting is …
environments, conditioned on their past motion and scene structure. Trajectory forecasting is …