Learning adversarial markov decision processes with bandit feedback and unknown transition C Jin, T Jin, H Luo, S Sra, T Yu International Conference on Machine Learning, 4860-4869, 2020 | 154* | 2020 |
Deep reinforcement learning for multi-driver vehicle dispatching and repositioning problem J Holler, R Vuorio, Z Qin, X Tang, Y Jiao, T Jin, S Singh, C Wang, J Ye 2019 IEEE International Conference on Data Mining (ICDM), 1090-1095, 2019 | 138 | 2019 |
Simultaneously learning stochastic and adversarial episodic mdps with known transition T Jin, H Luo Advances in neural information processing systems 33, 16557-16566, 2020 | 64 | 2020 |
The best of both worlds: stochastic and adversarial episodic mdps with unknown transition T Jin, L Huang, H Luo Advances in Neural Information Processing Systems 34, 20491-20502, 2021 | 51 | 2021 |
Boosting dynamic programming with neural networks for solving np-hard problems F Yang, T Jin, TY Liu, X Sun, J Zhang Asian Conference on Machine Learning, 726-739, 2018 | 28 | 2018 |
Near-optimal regret for adversarial mdp with delayed bandit feedback T Jin, T Lancewicki, H Luo, Y Mansour, A Rosenberg Advances in Neural Information Processing Systems 35, 33469-33481, 2022 | 26 | 2022 |
Suvrit Sra, and Tiancheng Yu. Learning adversarial mdps with bandit feedback and unknown transition C Jin, T Jin, H Luo arXiv preprint arXiv:1912.01192, 2019 | 21 | 2019 |
Improved best-of-both-worlds guarantees for multi-armed bandits: Ftrl with general regularizers and multiple optimal arms T Jin, J Liu, H Luo Advances in Neural Information Processing Systems 36, 30918-30978, 2023 | 20 | 2023 |
No-regret online reinforcement learning with adversarial losses and transitions T Jin, J Liu, C Rouyer, W Chang, CY Wei, H Luo Advances in Neural Information Processing Systems 36, 2024 | 12 | 2024 |
Suvrit Sra, and Tiancheng Yu C Jin, T Jin, H Luo Learning adversarial mdps with bandit feedback and unknown transition, 2019 | 5 | 2019 |
Robust and Adaptive Online Reinforcement Learning T Jin University of Southern California, 2024 | | 2024 |
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback A Rosenberg, H Luo, T Jin, Y Mansour | | 2022 |