- Academic Search

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Tallenna Viittaa Viittausten määrä 220 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with -Greedy Exploration

S Zhang, H Li, M Wang, M Liu… - Advances in …, 2023 - proceedings.neurips.cc

This paper provides a theoretical understanding of deep Q-Network (DQN) with the
$\varepsilon $-greedy exploration in deep reinforcement learning. Despite the tremendous …

Tallenna Viittaa Viittausten määrä 23 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The sample complexity of online contract design

B Zhu, S Bates, Z Yang, Y Wang, J Jiao… - arxiv preprint arxiv …, 2022 - arxiv.org

We study the hidden-action principal-agent problem in an online setting. In each round, the
principal posts a contract that specifies the payment to the agent based on each outcome …

Tallenna Viittaa Viittausten määrä 56 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Online learning in stackelberg games with an omniscient follower

G Zhao, B Zhu, J Jiao, M Jordan - … Conference on Machine …, 2023 - proceedings.mlr.press

We study the problem of online learning in a two-player decentralized cooperative
Stackelberg game. In each round, the leader first takes an action, followed by the follower …

Tallenna Viittaa Viittausten määrä 23 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Made: Exploration via maximizing deviation from explored regions

T Zhang, P Rashidinejad, J Jiao… - Advances in …, 2021 - proceedings.neurips.cc

In online reinforcement learning (RL), efficient exploration remains particularly challenging
in high-dimensional environments with sparse rewards. In low-dimensional environments …

Tallenna Viittaa Viittausten määrä 51 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Understanding Deep Neural Function Approximation in Reinforcement Learning via -Greedy Exploration

F Liu, L Viano, V Cevher - Advances in Neural Information …, 2022 - proceedings.neurips.cc

This paper provides a theoretical study of deep neural function approximation in
reinforcement learning (RL) with the $\epsilon $-greedy exploration under the online setting …

Tallenna Viittaa Viittausten määrä 19 Aiheeseen liittyviä artikkeleita Kaikki 11 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

First steps toward understanding the extrapolation of nonlinear models to unseen domains

K Dong, T Ma - arxiv preprint arxiv:2211.11719, 2022 - arxiv.org

Real-world machine learning applications often involve deploying neural networks to
domains that are not seen in the training time. Hence, we need to understand the …

Tallenna Viittaa Viittausten määrä 22 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Lifting the information ratio: An information-theoretic analysis of thompson sampling for contextual bandits

G Neu, I Olkhovskaia, M Papini… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual
bandits with binary losses and adversarially-selected contexts. We adapt the information …

Tallenna Viittaa Viittausten määrä 20 Aiheeseen liittyviä artikkeleita Kaikki 9 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Fast rates for nonparametric online learning: from realizability to learning in games

C Daskalakis, N Golowich - Proceedings of the 54th Annual ACM …, 2022 - dl.acm.org

We study fast rates of convergence in the setting of nonparametric online regression, namely
where regret is defined with respect to an arbitrary function class which has bounded …

Tallenna Viittaa Viittausten määrä 25 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Representation learning beyond linear prediction functions

Z Xu, A Tewari - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Recent papers on the theory of representation learning has shown the importance of a
quantity called diversity when generalizing from a set of source tasks to a target task. Most of …

Tallenna Viittaa Viittausten määrä 25 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace...

The statistical complexity of interactive decision making

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with -Greedy Exploration

The sample complexity of online contract design

Online learning in stackelberg games with an omniscient follower

Made: Exploration via maximizing deviation from explored regions

Understanding Deep Neural Function Approximation in Reinforcement Learning via -Greedy Exploration

First steps toward understanding the extrapolation of nonlinear models to unseen domains

Lifting the information ratio: An information-theoretic analysis of thompson sampling for contextual bandits

Fast rates for nonparametric online learning: from realizability to learning in games

Representation learning beyond linear prediction functions