A review of robot learning for manipulation: Challenges, representations, and algorithms

O Kroemer, S Niekum, G Konidaris - Journal of machine learning research, 2021 - jmlr.org
A key challenge in intelligent robotics is creating robots that are capable of directly
interacting with the world around them to achieve their goals. The last decade has seen …

Conservative q-learning for offline reinforcement learning

A Kumar, A Zhou, G Tucker… - Advances in Neural …, 2020 - proceedings.neurips.cc
Effectively leveraging large, previously collected datasets in reinforcement learn-ing (RL) is
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …

Off-policy deep reinforcement learning without exploration

S Fujimoto, D Meger, D Precup - … conference on machine …, 2019 - proceedings.mlr.press
Many practical applications of reinforcement learning constrain agents to learn from a fixed
batch of data which has already been gathered, without offering further possibility for data …

The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care

M Komorowski, LA Celi, O Badawi, AC Gordon… - Nature medicine, 2018 - nature.com
Sepsis is the third leading cause of death worldwide and the main cause of mortality in
hospitals,–, but the best treatment strategy remains uncertain. In particular, evidence …

Doubly robust off-policy value evaluation for reinforcement learning

N Jiang, L Li - International conference on machine learning, 2016 - proceedings.mlr.press
We study the problem of off-policy value evaluation in reinforcement learning (RL), where
one aims to estimate the value of a new policy based on data collected by a different policy …

Provably good batch off-policy reinforcement learning without great exploration

Y Liu, A Swaminathan, A Agarwal… - Advances in neural …, 2020 - proceedings.neurips.cc
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes
tasks. Doing batch RL in a way that yields a reliable new policy in large domains is …

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

Decision-making under uncertainty: beyond probabilities: Challenges and perspectives

T Badings, TD Simão, M Suilen, N Jansen - International Journal on …, 2023 - Springer
This position paper reflects on the state-of-the-art in decision-making under uncertainty. A
classical assumption is that probabilities can sufficiently capture all uncertainty in a system …

More robust doubly robust off-policy evaluation

M Farajtabar, Y Chow… - … on Machine Learning, 2018 - proceedings.mlr.press
We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

Preventing undesirable behavior of intelligent machines

PS Thomas, B Castro da Silva, AG Barto, S Giguere… - Science, 2019 - science.org
Intelligent machines using machine learning algorithms are ubiquitous, ranging from simple
data analysis and pattern recognition tools to complex systems that achieve superhuman …