Are we learning yet? a meta review of evaluation failures across machine learning

T Liao, R Taori, ID Raji, L Schmidt - Thirty-fifth Conference on …, 2021 - openreview.net
Many subfields of machine learning share a common stumbling block: evaluation. Advances
in machine learning often evaporate under closer scrutiny or turn out to be less widely …

Autonomous unmanned aerial vehicle navigation using reinforcement learning: A systematic review

F AlMahamid, K Grolinger - Engineering Applications of Artificial …, 2022 - Elsevier
There is an increasing demand for using Unmanned Aerial Vehicle (UAV), known as drones,
in different applications such as packages delivery, traffic monitoring, search and rescue …

Mastering diverse domains through world models

D Hafner, J Pasukonis, J Ba, T Lillicrap - ar** a general algorithm that learns to solve tasks across a wide range of
applications has been a fundamental challenge in artificial intelligence. Although current …

[HTML][HTML] Magnetic control of tokamak plasmas through deep reinforcement learning

J Degrave, F Felici, J Buchli, M Neunert, B Tracey… - Nature, 2022 - nature.com
Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a
promising path towards sustainable energy. A core challenge is to shape and maintain a …

What matters in learning from offline human demonstrations for robot manipulation

A Mandlekar, D Xu, J Wong, S Nasiriany… - ar** and mitigating misaligned models
A Pan, K Bhatia, J Steinhardt - arxiv preprint arxiv:2201.03544, 2022 - arxiv.org
Reward hacking--where RL agents exploit gaps in misspecified reward functions--has been
widely observed, but not yet systematically studied. To understand how reward hacking …

Decoupling value and policy for generalization in reinforcement learning

R Raileanu, R Fergus - International Conference on …, 2021 - proceedings.mlr.press
Standard deep reinforcement learning algorithms use a shared representation for the policy
and value function, especially when training directly from images. However, we argue that …