Proximal policy optimization algorithms J Schulman, F Wolski, P Dhariwal, A Radford, O Klimov arXiv preprint arXiv:1707.06347, 2017 | 24084 | 2017 |
Hindsight experience replay M Andrychowicz, F Wolski, A Ray, J Schneider, R Fong, P Welinder, ... Advances in neural information processing systems 30, 2017 | 3140 | 2017 |
Dota 2 with large scale deep reinforcement learning C Berner, G Brockman, B Chan, V Cheung, P Dębiak, C Dennison, ... arXiv preprint arXiv:1912.06680, 2019 | 2118 | 2019 |
Evolved policy gradients R Houthooft, Y Chen, P Isola, B Stadie, F Wolski, OAI Jonathan Ho, ... Advances in Neural Information Processing Systems 31, 2018 | 293 | 2018 |
Dota 2 with large scale deep reinforcement learning G Brockman, B Chan, V Cheung, P Debiak, C Dennison, D Farhi, ... arXiv preprint arXiv:1912.06680 2, 2019 | 123 | 2019 |
Advances in neural information processing systems M Andrychowicz, F Wolski, A Ray, J Schneider, R Fong, P Welinder, ... Proceedings of the 30th International Conference on Neural Information …, 2017 | 69 | 2017 |
Dota 2 with large scale deep reinforcement learning. arXiv 2019 C Berner, G Brockman, B Chan, V Cheung, P Debiak, C Dennison, ... arXiv preprint arXiv:1912.06680, 1912 | 64 | 1912 |
Klimov J Schulman, F Wolski, P Dhariwal, A Radford Proximal policy optimization algorithms, 1-12, 2017 | 54 | 2017 |
Proximal Policy Optimization Algorithms, August 2017 J Schulman, F Wolski, P Dhariwal, A Radford, O Klimov arXiv preprint arXiv:1707.06347, 0 | 50 | |
Proximal policy optimization J Schulman, O Klimov, F Wolski, P Dhariwal, A Radford arXiv preprint arXiv:1707.06347, 1-12, 2017 | 41 | 2017 |
Long-term planning and situational awareness in openai five J Raiman, S Zhang, F Wolski arXiv preprint arXiv:1912.06721, 2019 | 18 | 2019 |
Evolved Policy Gradients: Supplementary Materials R Houthooft, RY Chen, P Isola, BC Stadie, F Wolski, J Ho, P Abbeel | | |