Human-timescale adaptation in an open-ended task space

AA Team, J Bauer, K Baumli, S Baveja… - arxiv preprint arxiv …, 2023 - arxiv.org
Foundation models have shown impressive adaptation and scalability in supervised and self-
supervised learning problems, but so far these successes have not fully translated to …

Human-timescale adaptation in an open-ended task space

J Bauer, K Baumli, F Behbahani… - International …, 2023 - proceedings.mlr.press
Foundation models have shown impressive adaptation and scalability in supervised and self-
supervised learning problems, but so far these successes have not fully translated to …

Reinforcement learning: An overview

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

Meta-explore: Exploratory hierarchical vision-and-language navigation using scene object spectrum grounding

M Hwang, J Jeong, M Kim, Y Oh… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
The main challenge in vision-and-language navigation (VLN) is how to understand natural-
language instructions in an unseen environment. The main limitation of conventional VLN …

A mixture of surprises for unsupervised reinforcement learning

A Zhao, M Lin, Y Li, YJ Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Unsupervised reinforcement learning aims at learning a generalist policy in a reward-free
manner for fast adaptation to downstream tasks. Most of the existing methods propose to …

Planning goals for exploration

ES Hu, R Chang, O Rybkin, D Jayaraman - arxiv preprint arxiv …, 2023 - arxiv.org
Dropped into an unknown environment, what should an agent do to quickly learn about the
environment and how to accomplish diverse tasks within it? We address this question within …

The phenomenon of policy churn

T Schaul, A Barreto, J Quan… - Advances in Neural …, 2022 - proceedings.neurips.cc
We identify and study the phenomenon of policy churn, that is, the rapid change of the
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …

Boundless Socratic Learning with Language Games

T Schaul - arxiv preprint arxiv:2411.16905, 2024 - arxiv.org
An agent trained within a closed system can master any desired capability, as long as the
following three conditions hold:(a) it receives sufficiently informative and aligned …

Dep-rl: Embodied exploration for reinforcement learning in overactuated and musculoskeletal systems

P Schumacher, D Häufle, D Büchler, S Schmitt… - arxiv preprint arxiv …, 2022 - arxiv.org
Muscle-actuated organisms are capable of learning an unparalleled diversity of dexterous
movements despite their vast amount of muscles. Reinforcement learning (RL) on large …

Wtoe: Learning when to explore in multiagent reinforcement learning

S Dong, H Mao, S Yang, S Zhu, W Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Existing multiagent exploration works focus on how to explore in the fully cooperative task,
which is insufficient in the environment with nonstationarity induced by agent interactions. To …