- Academic Search

C **, Z Allen-Zhu, S Bubeck… - Advances in neural …, 2018 - proceedings.neurips.cc

Abstract Model-free reinforcement learning (RL) algorithms directly parameterize and
update value functions or policies, bypassing the modeling of the environment. They are …

Save Cite Cited by 1010 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Provably efficient reinforcement learning with linear function approximation

C **, Z Yang, Z Wang… - Conference on learning …, 2020 - proceedings.mlr.press

Abstract Modern Reinforcement Learning (RL) is commonly applied to practical problems
with an enormous number of states, where\emph {function approximation} must be deployed …

Save Cite Cited by 773 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Save Cite Cited by 107 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

Save Cite Cited by 323 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Sample-optimal parametric q-learning using linearly additive features

L Yang, M Wang - International conference on machine …, 2019 - proceedings.mlr.press

Consider a Markov decision process (MDP) that admits a set of state-action features, which
can linearly express the process's probabilistic transition model. We propose a parametric Q …

Save Cite Cited by 365 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Minimum cost flows, MDPs, and ℓ₁-regression in nearly linear time for dense instances

J Van Den Brand, YT Lee, YP Liu, T Saranurak… - Proceedings of the 53rd …, 2021 - dl.acm.org

In this paper we provide new randomized algorithms with improved runtimes for solving
linear programs with two-sided constraints. In the special case of the minimum cost flow …

Save Cite Cited by 156 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Almost optimal model-free reinforcement learningvia reference-advantage decomposition

Z Zhang, Y Zhou, X Ji - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …

Save Cite Cited by 178 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[HTML] mdpi.com

[HTML][HTML] Interdisciplinary Perspectives on Agent-Based Modeling in the Architecture, Engineering, and Construction Industry: A Comprehensive Review

S Mazzetto - Buildings, 2024 - mdpi.com

This paper explores the transformative impact of agent-based modeling (ABM) on the
architecture, engineering, and construction (AEC) industry, highlighting its indispensable …

Save Cite Cited by 4 Related articles All 3 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] neurips.cc

Near-optimal time and sample complexities for solving Markov decision processes with a generative model

A Sidford, M Wang, X Wu, L Yang… - Advances in Neural …, 2018 - proceedings.neurips.cc

In this paper we consider the problem of computing an $\epsilon $-optimal policy of a
discounted Markov Decision Process (DMDP) provided we can only access its transition …

Save Cite Cited by 256 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Model-based reinforcement learning with a generative model is minimax optimal

A Agarwal, S Kakade, LF Yang - Conference on Learning …, 2020 - proceedings.mlr.press

This work considers the sample and computational complexity of obtaining an $\epsilon $-
optimal policy in a discounted Markov Decision Process (MDP), given only access to a …

Save Cite Cited by 206 Related articles All 9 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Variance reduced value iteration and faster algorithms for solving Markov decision processes

Is Q-learning provably efficient?

Provably efficient reinforcement learning with linear function approximation

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

Provably efficient exploration in policy optimization

Sample-optimal parametric q-learning using linearly additive features

Minimum cost flows, MDPs, and ℓ₁-regression in nearly linear time for dense instances

Almost optimal model-free reinforcement learningvia reference-advantage decomposition

[HTML][HTML] Interdisciplinary Perspectives on Agent-Based Modeling in the Architecture, Engineering, and Construction Industry: A Comprehensive Review

Near-optimal time and sample complexities for solving Markov decision processes with a generative model

Model-based reinforcement learning with a generative model is minimax optimal