Explainable reinforcement learning: A survey and comparative review

S Milani, N Topin, M Veloso, F Fang - ACM Computing Surveys, 2024‏ - dl.acm.org
Explainable reinforcement learning (XRL) is an emerging subfield of explainable machine
learning that has attracted considerable attention in recent years. The goal of XRL is to …

Inductive biases for deep learning of higher-level cognition

A Goyal, Y Bengio - Proceedings of the Royal Society A, 2022‏ - royalsocietypublishing.org
A fascinating hypothesis is that human and animal intelligence could be explained by a few
principles (rather than an encyclopaedic list of heuristics). If that hypothesis was correct, we …

Principle-driven self-alignment of language models from scratch with minimal human supervision

Z Sun, Y Shen, Q Zhou, H Zhang… - Advances in …, 2023‏ - proceedings.neurips.cc
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning
(SFT) with human annotations and reinforcement learning from human feedback (RLHF) to …

Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling

K Nottingham, P Ammanabrolu, A Suhr… - International …, 2023‏ - proceedings.mlr.press
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of
the world. However, if initialized with knowledge of high-level subgoals and transitions …

Interpretable reward redistribution in reinforcement learning: A causal approach

Y Zhang, Y Du, B Huang, Z Wang… - Advances in …, 2023‏ - proceedings.neurips.cc
A major challenge in reinforcement learning is to determine which state-action pairs are
responsible for future rewards that are delayed. Reward redistribution serves as a solution to …

SALMON: Self-alignment with instructable reward models

Z Sun, Y Shen, H Zhang, Q Zhou, Z Chen… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement
Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM …

A dataset perspective on offline reinforcement learning

K Schweighofer, M Dinu, A Radler… - Conference on …, 2022‏ - proceedings.mlr.press
Abstract The application of Reinforcement Learning (RL) in real world environments can be
expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is …