- Academic Search

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Opslaan Citeren Geciteerd door 217 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective

DJ Foster, A Rakhlin, D Simchi-Levi, Y Xu - arxiv preprint arxiv …, 2020 - arxiv.org

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved
performance on" easy" problems with a gap between the best and second-best arm. Are …

Opslaan Citeren Geciteerd door 104 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Streaming active learning with deep neural networks

A Saran, S Yousefi, A Krishnamurthy… - International …, 2023 - proceedings.mlr.press

Active learning is perhaps most naturally posed as an online learning problem. However,
prior active learning approaches with deep neural networks assume offline access to the …

Opslaan Citeren Geciteerd door 20 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Contextual bandits and imitation learning with preference-based active queries

A Sekhari, K Sridharan, W Sun… - Advances in Neural …, 2023 - proceedings.neurips.cc

We consider the problem of contextual bandits and imitation learning, where the learner
lacks direct knowledge of the executed action's reward. Instead, the learner can actively …

Opslaan Citeren Geciteerd door 19 Verwante artikelen Alle 8 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Making rl with preference-based feedback efficient via randomization

R Wu, W Sun - arxiv preprint arxiv:2310.14554, 2023 - arxiv.org

Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …

Opslaan Citeren Geciteerd door 24 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reinforcement learning from human feedback with active queries

K Ji, J He, Q Gu - arxiv preprint arxiv:2402.09401, 2024 - arxiv.org

Aligning large language models (LLM) with human preference plays a key role in building
modern generative models and can be achieved by reinforcement learning from human …

Opslaan Citeren Geciteerd door 19 Verwante artikelen Alle 3 versies HTML-versie

Recent advances in scaling‐down sampling methods in machine learning

A ElRafey, J Wojtusiak - Wiley Interdisciplinary Reviews …, 2017 - Wiley Online Library

Data sampling methods have been investigated for decades in the context of machine
learning and statistical algorithms, with significant progress made in the past few years …

Opslaan Citeren Geciteerd door 34 Verwante artikelen Alle 2 versies

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Active learning for cost-sensitive classification

A Krishnamurthy, A Agarwal, TK Huang… - Journal of Machine …, 2019 - jmlr.org

We design an active learning algorithm for cost-sensitive multiclass classification: problems
where different errors have different costs. Our algorithm, COAL, makes predictions by …

Opslaan Citeren Geciteerd door 111 Verwante artikelen Alle 8 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Towards a unified information-theoretic framework for generalization

M Haghifam, GK Dziugaite… - Advances in Neural …, 2021 - proceedings.neurips.cc

In this work, we investigate the expressiveness of the" conditional mutual information"(CMI)
framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a …

Opslaan Citeren Geciteerd door 41 Verwante artikelen Alle 8 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Adversarially robust learning: A generic minimax optimal learner and characterization

O Montasser, S Hanneke… - Advances in Neural …, 2022 - proceedings.neurips.cc

We present a minimax optimal learner for the problem of learning predictors robust to
adversarial examples at test-time. Interestingly, we find that this requires new algorithmic …

Opslaan Citeren Geciteerd door 23 Verwante artikelen Alle 5 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Minimax analysis of active learning.

The statistical complexity of interactive decision making

Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective

Streaming active learning with deep neural networks

Contextual bandits and imitation learning with preference-based active queries

Making rl with preference-based feedback efficient via randomization

Reinforcement learning from human feedback with active queries

Recent advances in scaling‐down sampling methods in machine learning

Active learning for cost-sensitive classification

Towards a unified information-theoretic framework for generalization

Adversarially robust learning: A generic minimax optimal learner and characterization