Motion planning for autonomous driving: The state of the art and future perspectives

S Teng, X Hu, P Deng, B Li, Y Li, Y Ai… - IEEE Transactions …, 2023‏ - ieeexplore.ieee.org
Intelligent vehicles (IVs) have gained worldwide attention due to their increased
convenience, safety advantages, and potential commercial value. Despite predictions of …

Knowledge-integrated machine learning for materials: lessons from gameplaying and robotics

K Hippalgaonkar, Q Li, X Wang, JW Fisher III… - Nature Reviews …, 2023‏ - nature.com
As materials researchers increasingly embrace machine-learning (ML) methods, it is natural
to wonder what lessons can be learned from other fields undergoing similar developments …

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Principled reinforcement learning with human feedback from pairwise or k-wise comparisons

B Zhu, M Jordan, J Jiao - International Conference on …, 2023‏ - proceedings.mlr.press
We provide a theoretical framework for Reinforcement Learning with Human Feedback
(RLHF). We show that when the underlying true reward is linear, under both Bradley-Terry …

Video pretraining (vpt): Learning to act by watching unlabeled online videos

B Baker, I Akkaya, P Zhokov… - Advances in …, 2022‏ - proceedings.neurips.cc
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for
training models with broad, general capabilities for text, images, and other modalities …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021‏ - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

A survey of zero-shot generalisation in deep reinforcement learning

R Kirk, A Zhang, E Grefenstette, T Rocktäschel - Journal of Artificial …, 2023‏ - jair.org
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to
produce RL algorithms whose policies generalise well to novel unseen situations at …

Defining and characterizing reward gaming

J Skalse, N Howe… - Advances in Neural …, 2022‏ - proceedings.neurips.cc
We provide the first formal definition of\textbf {reward hacking}, a phenomenon where
optimizing an imperfect proxy reward function, $\mathcal {\tilde {R}} $, leads to poor …

Deep reinforcement learning in smart manufacturing: A review and prospects

C Li, P Zheng, Y Yin, B Wang, L Wang - CIRP Journal of Manufacturing …, 2023‏ - Elsevier
To facilitate the personalized smart manufacturing paradigm with cognitive automation
capabilities, Deep Reinforcement Learning (DRL) has attracted ever-increasing attention by …

Reinforcement learning algorithms: A brief survey

AK Shakya, G Pillai, S Chakrabarty - Expert Systems with Applications, 2023‏ - Elsevier
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential
decision-making in complex problems. RL is inspired by trial-and-error based human/animal …