Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Learning to identify critical states for reinforcement learning from videos

H Liu, M Zhuge, B Li, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic
information about good policies can be extracted from offline data which lack explicit …

A survey of temporal credit assignment in deep reinforcement learning

E Pignatelli, J Ferret, M Geist, T Mesnard… - arxiv preprint arxiv …, 2023 - arxiv.org
The Credit Assignment Problem (CAP) refers to the longstanding challenge of
Reinforcement Learning (RL) agents to associate actions with their long-term …

Learning useful representations of recurrent neural network weight matrices

V Herrmann, F Faccio, J Schmidhuber - arxiv preprint arxiv:2403.11998, 2024 - arxiv.org
Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The
program of an RNN is its weight matrix. How to learn useful representations of RNN weights …

Goal-conditioned generators of deep policies

F Faccio, V Herrmann, A Ramesh, L Kirsch… - Proceedings of the …, 2023 - ojs.aaai.org
Abstract Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies,
given goals encoded in special command inputs. Here we study goal-conditioned neural …

What about inputting policy in value function: Policy representation and policy-extended value function approximator

H Tang, Z Meng, J Hao, C Chen, D Graves… - Proceedings of the …, 2022 - ojs.aaai.org
Abstract We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement
Learning (RL), which extends conventional value function approximator (VFA) to take as …

[PDF][PDF] Learning Efficient Truthful Mechanisms for Trading Networks.

T Osogami, S Wasserkrug, ES Shamash - IJCAI, 2023 - ijcai.org
Trading networks are an indispensable part of today's economy, but to compete successfully
with others, they must be efficient in maximizing the value they provide to the external …

General policy evaluation and improvement by learning to identify few but crucial states

F Faccio, A Ramesh, V Herrmann, J Harb… - arxiv preprint arxiv …, 2022 - arxiv.org
Learning to evaluate and improve policies is a core problem of Reinforcement Learning
(RL). Traditional RL algorithms learn a value function defined for a single policy. A recently …

Exploring through random curiosity with general value functions

A Ramesh, L Kirsch, S van Steenkiste… - Advances in Neural …, 2022 - proceedings.neurips.cc
Efficient exploration in reinforcement learning is a challenging problem commonly
addressed through intrinsic rewards. Recent prominent approaches are based on state …

Learning one abstract bit at a time through self-invented experiments encoded as neural networks

V Herrmann, L Kirsch, J Schmidhuber - International Workshop on Active …, 2023 - Springer
There are two important things in science:(A) Finding answers to given questions, and (B)
Coming up with good questions. Our artificial scientists not only learn to answer given …