Preventing undesirable behavior of intelligent machines

PS Thomas, B Castro da Silva, AG Barto, S Giguere… - Science, 2019 - science.org
Intelligent machines using machine learning algorithms are ubiquitous, ranging from simple
data analysis and pattern recognition tools to complex systems that achieve superhuman …

Time efficiency in optimization with a bayesian-evolutionary algorithm

G Lan, JM Tomczak, DM Roijers, AE Eiben - Swarm and Evolutionary …, 2022 - Elsevier
Not all generate-and-test search algorithms are created equal. Bayesian Optimization (BO)
invests a lot of computation time to generate the candidate solution that best balances the …

Learning parameterized skills

B Da Silva, G Konidaris, A Barto - arxiv preprint arxiv:1206.6398, 2012 - arxiv.org
We introduce a method for constructing skills capable of solving tasks drawn from a
distribution of parameterized reinforcement learning problems. The method draws example …

Model-based reinforcement learning with parametrized physical models and optimism-driven exploration

C **e, S Patil, T Moldovan, S Levine… - … conference on robotics …, 2016 - ieeexplore.ieee.org
In this paper, we present a robotic model-based reinforcement learning method that
combines ideas from model identification and model predictive control. We use a feature …

Optimism-driven exploration for nonlinear systems

TM Moldovan, S Levine, MI Jordan… - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
Tasks with unknown dynamics and costly system interaction time present a serious
challenge for reinforcement learning. If a model of the dynamics can be learned quickly …

Heteroscedastic bayesian optimisation for stochastic model predictive control

R Guzman, R Oliveira, F Ramos - IEEE Robotics and …, 2020 - ieeexplore.ieee.org
Model predictive control (MPC) has been successful in applications involving the control of
complex physical systems. This class of controllers leverages the information provided by an …

Projected natural actor-critic

PS Thomas, WC Dabney, S Giguere… - Advances in neural …, 2013 - proceedings.neurips.cc
Natural actor-critics are a popular class of policy search algorithms for finding locally optimal
policies for Markov decision processes. In this paper we address a drawback of natural actor …

Variable risk control via stochastic optimization

SR Kuindersma, RA Grupen… - … International Journal of …, 2013 - journals.sagepub.com
We present new global and local policy search algorithms suitable for problems with policy-
dependent cost variance (or risk), a property present in many robot control tasks. These …

Active learning of parameterized skills

B Da Silva, G Konidaris, A Barto - … Conference on Machine …, 2014 - proceedings.mlr.press
We introduce a method for actively learning parameterized skills. Parameterized skills are
flexible behaviors that can solve any task drawn from a distribution of parameterized …

On ensuring that intelligent machines are well-behaved

PS Thomas, BC da Silva, AG Barto… - arxiv preprint arxiv …, 2017 - arxiv.org
Machine learning algorithms are everywhere, ranging from simple data analysis and pattern
recognition tools used across the sciences to complex systems that achieve super-human …