Human-timescale adaptation in an open-ended task space
Foundation models have shown impressive adaptation and scalability in supervised and self-
supervised learning problems, but so far these successes have not fully translated to …
supervised learning problems, but so far these successes have not fully translated to …
Human-timescale adaptation in an open-ended task space
Foundation models have shown impressive adaptation and scalability in supervised and self-
supervised learning problems, but so far these successes have not fully translated to …
supervised learning problems, but so far these successes have not fully translated to …
Reinforcement learning: An overview
K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …
learning and sequential decision making, covering value-based RL, policy-gradient …
Meta-explore: Exploratory hierarchical vision-and-language navigation using scene object spectrum grounding
The main challenge in vision-and-language navigation (VLN) is how to understand natural-
language instructions in an unseen environment. The main limitation of conventional VLN …
language instructions in an unseen environment. The main limitation of conventional VLN …
A mixture of surprises for unsupervised reinforcement learning
Unsupervised reinforcement learning aims at learning a generalist policy in a reward-free
manner for fast adaptation to downstream tasks. Most of the existing methods propose to …
manner for fast adaptation to downstream tasks. Most of the existing methods propose to …
Planning goals for exploration
Dropped into an unknown environment, what should an agent do to quickly learn about the
environment and how to accomplish diverse tasks within it? We address this question within …
environment and how to accomplish diverse tasks within it? We address this question within …
The phenomenon of policy churn
We identify and study the phenomenon of policy churn, that is, the rapid change of the
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …
Boundless Socratic Learning with Language Games
T Schaul - arxiv preprint arxiv:2411.16905, 2024 - arxiv.org
An agent trained within a closed system can master any desired capability, as long as the
following three conditions hold:(a) it receives sufficiently informative and aligned …
following three conditions hold:(a) it receives sufficiently informative and aligned …
Dep-rl: Embodied exploration for reinforcement learning in overactuated and musculoskeletal systems
Muscle-actuated organisms are capable of learning an unparalleled diversity of dexterous
movements despite their vast amount of muscles. Reinforcement learning (RL) on large …
movements despite their vast amount of muscles. Reinforcement learning (RL) on large …
Wtoe: Learning when to explore in multiagent reinforcement learning
Existing multiagent exploration works focus on how to explore in the fully cooperative task,
which is insufficient in the environment with nonstationarity induced by agent interactions. To …
which is insufficient in the environment with nonstationarity induced by agent interactions. To …