[HTML][HTML] Reward is enough

D Silver, S Singh, D Precup, RS Sutton - Artificial Intelligence, 2021‏ - Elsevier
In this article we hypothesise that intelligence, and its associated abilities, can be
understood as subserving the maximisation of reward. Accordingly, reward is enough to …

Reinforcement learning for intelligent healthcare applications: A survey

A Coronato, M Naeem, G De Pietro… - Artificial intelligence in …, 2020‏ - Elsevier
Discovering new treatments and personalizing existing ones is one of the major goals of
modern clinical research. In the last decade, Artificial Intelligence (AI) has enabled the …

A survey of inverse reinforcement learning: Challenges, methods and progress

S Arora, P Doshi - Artificial Intelligence, 2021‏ - Elsevier
Inverse reinforcement learning (IRL) is the problem of inferring the reward function of an
agent, given its policy or observed behavior. Analogous to RL, IRL is perceived both as a …

Reinforcement learning and control as probabilistic inference: Tutorial and review

S Levine - arxiv preprint arxiv:1805.00909, 2018‏ - arxiv.org
The framework of reinforcement learning or optimal control provides a mathematical
formalization of intelligent decision making that is powerful and broadly applicable. While …

Generative adversarial imitation learning

J Ho, S Ermon - Advances in neural information processing …, 2016‏ - proceedings.neurips.cc
Consider learning a policy from example expert behavior, without interaction with the expert
or access to a reinforcement signal. One approach is to recover the expert's cost function …

Deep reinforcement learning

SE Li - Reinforcement learning for sequential decision and …, 2023‏ - Springer
Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …

Sqil: Imitation learning via reinforcement learning with sparse rewards

S Reddy, AD Dragan, S Levine - arxiv preprint arxiv:1905.11108, 2019‏ - arxiv.org
Learning to imitate expert behavior from demonstrations can be challenging, especially in
environments with high-dimensional, continuous observations and unknown dynamics …

Chomp: Covariant hamiltonian optimization for motion planning

M Zucker, N Ratliff, AD Dragan… - … journal of robotics …, 2013‏ - journals.sagepub.com
In this paper, we present CHOMP (covariant Hamiltonian optimization for motion planning),
a method for trajectory optimization invariant to reparametrization. CHOMP uses functional …

Learning multi-agent behaviors from distributed and streaming demonstrations

S Liu, M Zhu - Advances in Neural Information Processing …, 2023‏ - proceedings.neurips.cc
This paper considers the problem of inferring the behaviors of multiple interacting experts by
estimating their reward functions and constraints where the distributed demonstrated …

Trajectory forecasts in unknown environments conditioned on grid-based plans

N Deo, MM Trivedi - arxiv preprint arxiv:2001.00735, 2020‏ - arxiv.org
We address the problem of forecasting pedestrian and vehicle trajectories in unknown
environments, conditioned on their past motion and scene structure. Trajectory forecasting is …