Are we learning yet? a meta review of evaluation failures across machine learning
Many subfields of machine learning share a common stumbling block: evaluation. Advances
in machine learning often evaporate under closer scrutiny or turn out to be less widely …
in machine learning often evaporate under closer scrutiny or turn out to be less widely …
Autonomous unmanned aerial vehicle navigation using reinforcement learning: A systematic review
There is an increasing demand for using Unmanned Aerial Vehicle (UAV), known as drones,
in different applications such as packages delivery, traffic monitoring, search and rescue …
in different applications such as packages delivery, traffic monitoring, search and rescue …
Mastering diverse domains through world models
D Hafner, J Pasukonis, J Ba, T Lillicrap - ar** a general algorithm that learns to solve tasks across a wide range of
applications has been a fundamental challenge in artificial intelligence. Although current …
applications has been a fundamental challenge in artificial intelligence. Although current …
[HTML][HTML] Magnetic control of tokamak plasmas through deep reinforcement learning
Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a
promising path towards sustainable energy. A core challenge is to shape and maintain a …
promising path towards sustainable energy. A core challenge is to shape and maintain a …
What matters in learning from offline human demonstrations for robot manipulation
A Mandlekar, D Xu, J Wong, S Nasiriany… - ar** and mitigating misaligned models
Reward hacking--where RL agents exploit gaps in misspecified reward functions--has been
widely observed, but not yet systematically studied. To understand how reward hacking …
widely observed, but not yet systematically studied. To understand how reward hacking …
Decoupling value and policy for generalization in reinforcement learning
R Raileanu, R Fergus - International Conference on …, 2021 - proceedings.mlr.press
Standard deep reinforcement learning algorithms use a shared representation for the policy
and value function, especially when training directly from images. However, we argue that …
and value function, especially when training directly from images. However, we argue that …