„Google“ mokslinčius

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org

Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

Išsaugoti Cituoti Cituoja 235 Susiję straipsniai Visos 17 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural policy gradient methods: Global optimality and rates of convergence

L Wang, Q Cai, Z Yang, Z Wang - ar**

V Dewanto, G Dunn, A Eshragh, M Gallagher… - arxiv preprint arxiv …, 2020 - arxiv.org

Reinforcement learning is important part of artificial intelligence. In this paper, we review
model-free reinforcement learning that utilizes the average reward optimality criterion in the …

Išsaugoti Cituoti Cituoja 31 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Distributed learning in the nonconvex world: From batch data to streaming and beyond

TH Chang, M Hong, HT Wai… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org

Distributed learning has become a critical enabler of the massively connected world that
many people envision. This article discusses four key elements of scalable distributed …

Išsaugoti Cituoti Cituoja 107 Susiję straipsniai Visos 9 versijos

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

On the bias-variance-cost tradeoff of stochastic optimization

Y Hu, X Chen, N He - Advances in Neural Information …, 2021 - proceedings.neurips.cc

We consider stochastic optimization when one only has access to biased stochastic oracles
of the objective, and obtaining stochastic gradients with low biases comes at high costs. This …

Išsaugoti Cituoti Cituoja 46 Susiję straipsniai Visos 10 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning

Y Hu, S Zhang, X Chen, N He - Advances in Neural …, 2020 - proceedings.neurips.cc

Conditional stochastic optimization covers a variety of applications ranging from invariant
learning and causal inference to meta-learning. However, constructing unbiased gradient …

Išsaugoti Cituoti Cituoja 68 Susiję straipsniai Visos 11 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Non-asymptotic convergence analysis of two time-scale (natural) actor-critic algorithms

T Xu, Z Wang, Y Liang - arxiv preprint arxiv:2005.03557, 2020 - arxiv.org

As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-
critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first …

Išsaugoti Cituoti Cituoja 68 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Multi-agent performative prediction with greedy deployment and consensus seeking agents

Q Li, CY Yau, HT Wai - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We consider a scenario where multiple agents are learning a common decision vector from
data which can be influenced by the agents' decisions. This leads to the problem of multi …

Išsaugoti Cituoti Cituoja 25 Susiję straipsniai Visos 6 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Non-asymptotic analysis of biased stochastic approximation scheme

Fast global convergence of natural policy gradient methods with entropy regularization

Neural policy gradient methods: Global optimality and rates of convergence

Distributed learning in the nonconvex world: From batch data to streaming and beyond

On the bias-variance-cost tradeoff of stochastic optimization

Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning

Non-asymptotic convergence analysis of two time-scale (natural) actor-critic algorithms

Multi-agent performative prediction with greedy deployment and consensus seeking agents