- Academic Search

J He, T Wang, Y Min, Q Gu - Advances in neural information …, 2022 - proceedings.neurips.cc

We study federated contextual linear bandits, where $ M $ agents cooperate with each other
to solve a global contextual linear bandit problem with the help of a central server. We …

保存引用被引用数: 34 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Contextual bandits with large action spaces: Made practical

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press

A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

保存引用被引用数: 39 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

保存引用被引用数: 7 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

An exponential lower bound for linearly realizable mdp with constant suboptimality gap

Y Wang, R Wang, S Kakade - Advances in Neural …, 2021 - proceedings.neurips.cc

A fundamental question in the theory of reinforcement learning is: suppose the optimal $ Q $-
function lies in the linear span of a given $ d $ dimensional feature map**, is sample …

保存引用被引用数: 53 関連記事全 11 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Provably efficient reinforcement learning with linear function approximation under adaptivity constraints

T Wang, D Zhou, Q Gu - Advances in Neural Information …, 2021 - proceedings.neurips.cc

We study reinforcement learning (RL) with linear function approximation under the adaptivity
constraint. We consider two popular limited adaptivity models: the batch learning model and …

保存引用被引用数: 50 関連記事全 11 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Impact of representation learning in linear bandits

J Yang, W Hu, JD Lee, SS Du - arxiv preprint arxiv:2010.06531, 2020 - arxiv.org

We study how representation learning can improve the efficiency of bandit problems. We
study the setting where we play $ T $ linear bandits with dimension $ d $ concurrently, and …

保存引用被引用数: 51 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Near-optimal regret bounds for multi-batch reinforcement learning

Z Zhang, Y Jiang, Y Zhou, X Ji - Advances in Neural …, 2022 - proceedings.neurips.cc

In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …

保存引用被引用数: 14 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Efficient batched algorithm for contextual linear bandits with large action space via soft elimination

O Hanna, L Yang, C Fragouli - Advances in Neural …, 2024 - proceedings.neurips.cc

In this paper, we provide the first efficient batched algorithm for contextual linear bandits with
large action spaces. Unlike existing batched algorithms that rely on action elimination, which …

保存引用被引用数: 7 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Experiment planning with function approximation

A Pacchiano, J Lee, E Brunskill - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the problem of experiment planning with function approximation in contextual
bandit problems. In settings where there is a significant overhead to deploying adaptive …

保存引用被引用数: 5 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Cooperative multi-agent reinforcement learning: Asynchronous communication and linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …

保存引用被引用数: 9 関連記事全 7 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Linear bandits with limited adaptivity and learning distributional optimal design

A simple and provably efficient algorithm for asynchronous federated contextual linear bandits

Contextual bandits with large action spaces: Made practical

Policy finetuning in reinforcement learning via design of experiments using offline data

An exponential lower bound for linearly realizable mdp with constant suboptimality gap

Provably efficient reinforcement learning with linear function approximation under adaptivity constraints

Impact of representation learning in linear bandits

Near-optimal regret bounds for multi-batch reinforcement learning

Efficient batched algorithm for contextual linear bandits with large action space via soft elimination

Experiment planning with function approximation

Cooperative multi-agent reinforcement learning: Asynchronous communication and linear function approximation