Theo dõi
Zhang Zihan
Tiêu đề
Trích dẫn bởi
Trích dẫn bởi
Năm
Almost optimal model-free reinforcement learningvia reference-advantage decomposition
Z Zhang, Y Zhou, X Ji
Advances in Neural Information Processing Systems 33, 15198-15207, 2020
1812020
Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon
Z Zhang, X Ji, S Du
Conference on Learning Theory, 4528-4531, 2021
1292021
Regret minimization for reinforcement learning by evaluating the optimal bias function
Z Zhang, X Ji
Advances in Neural Information Processing Systems 32, 2019
862019
Improved variance-aware confidence sets for linear bandits and linear mixture mdp
Z Zhang, J Yang, X Ji, SS Du
Advances in Neural Information Processing Systems 34, 4342-4355, 2021
69*2021
Near optimal reward-free reinforcement learning
Z Zhang, S Du, X Ji
International Conference on Machine Learning, 12402-12412, 2021
62*2021
Model-free reinforcement learning: from clipped pseudo-regret to sample complexity
Z Zhang, Y Zhou, X Ji
International Conference on Machine Learning, 12653-12662, 2021
432021
Horizon-free reinforcement learning in polynomial time: the power of stationary policies
Z Zhang, X Ji, S Du
Conference on Learning Theory, 3858-3904, 2022
292022
Settling the sample complexity of online reinforcement learning
Z Zhang, Y Chen, JD Lee, SS Du
The Thirty Seventh Annual Conference on Learning Theory, 5213-5219, 2024
242024
Sharper model-free reinforcement learning for average-reward markov decision processes
Z Zhang, Q Xie
The Thirty Sixth Annual Conference on Learning Theory, 5476-5477, 2023
172023
Optimal multi-distribution learning
Z Zhang, W Zhan, Y Chen, SS Du, JD Lee
The Thirty Seventh Annual Conference on Learning Theory, 5220-5223, 2024
162024
Near-optimal regret bounds for multi-batch reinforcement learning
Z Zhang, Y Jiang, Y Zhou, X Ji
Advances in Neural Information Processing Systems 35, 24586-24596, 2022
152022
Sharp variance-dependent bounds in reinforcement learning: Best of both worlds in stochastic and deterministic environments
R Zhou, Z Zihan, SS Du
International Conference on Machine Learning, 42878-42914, 2023
122023
Almost optimal batch-regret tradeoff for batch linear contextual bandits
Z Zhang, X Ji, Y Zhou
arXiv preprint arXiv:2110.08057, 2021
92021
Achieving tractable minimax optimal regret in average reward mdps
V Boone, Z Zhang
arXiv preprint arXiv:2406.01234, 2024
52024
Horizon-free regret for linear markov decision processes
Z Zhang, JD Lee, Y Chen, SS Du
arXiv preprint arXiv:2403.10738, 2024
32024
Anytime Acceleration of Gradient Descent
Z Zhang, JD Lee, SS Du, Y Chen
arXiv preprint arXiv:2411.17668, 2024
12024
Hệ thống không thể thực hiện thao tác ngay bây giờ. Hãy thử lại sau.
Bài viết 1–16