Google Académico

A Zhang, ZC Lipton, M Li, AJ Smola - arxiv preprint arxiv:2106.11342, 2021 - arxiv.org

This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

Guardar Citar Citado por 1227 Artículos relacionados Las 9 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

D Foster, A Rakhlin - International Conference on Machine …, 2020 - proceedings.mlr.press

A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

Guardar Citar Citado por 237 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Sample complexity of reinforcement learning using linearly combined model ensembles

A Modi, N Jiang, A Tewari… - … Conference on Artificial …, 2020 - proceedings.mlr.press

Reinforcement learning (RL) methods have been shown to be capable of learning intelligent
behavior in rich domains. However, this has largely been done in simulated domains without …

Guardar Citar Citado por 169 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Adapting to misspecification in contextual bandits

DJ Foster, C Gentile, M Mohri… - Advances in Neural …, 2020 - proceedings.neurips.cc

A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …

Guardar Citar Citado por 116 Artículos relacionados Las 9 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press

We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

Guardar Citar Citado por 63 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Model selection in contextual stochastic bandit problems

A Pacchiano, M Phan… - Advances in …, 2020 - proceedings.neurips.cc

We study bandit model selection in stochastic environments. Our approach relies on a
master algorithm that selects between candidate base algorithms. We develop a master …

Guardar Citar Citado por 113 Artículos relacionados Las 8 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hedging the drift: Learning to optimize under nonstationarity

WC Cheung, D Simchi-Levi, R Zhu - Management Science, 2022 - pubsonline.informs.org

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …

Guardar Citar Citado por 126 Artículos relacionados Las 11 versiones

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Oracle inequalities for model selection in offline reinforcement learning

JN Lee, G Tucker, O Nachum, B Dai… - Advances in Neural …, 2022 - proceedings.neurips.cc

In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good
policy without interacting with the environment. A major challenge in applying such methods …

Guardar Citar Citado por 15 Artículos relacionados Las 8 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Dynamic balancing for model selection in bandits and rl

A Cutkosky, C Dann, A Das, C Gentile… - International …, 2021 - proceedings.mlr.press

We propose a framework for model selection by combining base algorithms in stochastic
bandits and reinforcement learning. We require a candidate regret bound for each base …

Guardar Citar Citado por 40 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Artificial intelligence for materials research at extremes

B Maruyama, J Hattrick-Simpers, W Musinski… - MRS Bulletin, 2022 - Springer

Materials development is slow and expensive, taking decades from inception to fielding. For
materials research at extremes, the situation is even more demanding, as the desired …

Guardar Citar Citado por 8 Artículos relacionados Las 4 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Model selection for contextual bandits

Dive into deep learning

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

Sample complexity of reinforcement learning using linearly combined model ensembles

Adapting to misspecification in contextual bandits

A model selection approach for corruption robust reinforcement learning

Model selection in contextual stochastic bandit problems

Hedging the drift: Learning to optimize under nonstationarity

Oracle inequalities for model selection in offline reinforcement learning

Dynamic balancing for model selection in bandits and rl

Artificial intelligence for materials research at extremes