עקוב אחר
A. Rupam Mahmood
A. Rupam Mahmood
University of Alberta, Amii
כתובת אימייל מאומתת בדומיין ualberta.ca - דף הבית
כותרת
צוטט על ידי
צוטט על ידי
שנה
An emphatic approach to the problem of off-policy temporal-difference learning
RS Sutton, AR Mahmood, M White
(JMLR) Journal of Machine Learning Research 17, 2016
3282016
Benchmarking reinforcement learning algorithms on real-world robots
AR Mahmood, D Korenkevych, G Vasan, W Ma, J Bergstra
(CoRL) Proceedings of the 2nd Annual Conference on Robot Learning, 2018
2352018
Weighted importance sampling for off-policy learning with linear function approximation
AR Mahmood, H van Hasselt, RS Sutton
(NeurIPS) Advances in Neural Information Processing Systems 27, 2014
1912014
True online temporal-difference learning
H van Seijen, AR Mahmood, PM Pilarski, MC Machado, RS Sutton
(JMLR) Journal of Machine Learning Research 17, 2016
1252016
Setting up a reinforcement learning task with a real-world robot
AR Mahmood, D Korenkevych, BJ Komer, J Bergstra
(IROS) 2018 IEEE/RSJ International Conference on Intelligent Robots and …, 2018
1062018
Tuning-free step-size adaptation
AR Mahmood, RS Sutton, T Degris, PM Pilarski
(ICASSP) Acoustics, Speech and Signal Processing, 2012 IEEE International …, 2012
892012
Continual backprop: Stochastic gradient descent with persistent randomness
S Dohare, RS Sutton, AR Mahmood
arXiv preprint arXiv:2108.06325, 2021
792021
Loss of plasticity in deep continual learning
S Dohare, JF Hernandez-Garcia, Q Lan, P Rahman, AR Mahmood, ...
Nature 632 (8026), 768-774, 2024
592024
Multi-step off-policy learning without importance sampling ratios
AR Mahmood, H Yu, RS Sutton
arXiv preprint arXiv:1702.03006, 2017
552017
Representation Search through Generate and Test
AR Mahmood, RS Sutton
Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013
502013
Off-policy TD (λ) with a true online equivalence
H van Hasselt, AR Mahmood, RS Sutton
(UAI) Proceedings of the 30th Conference on Uncertainty in Artificial …, 2014
482014
On generalized Bellman equations and temporal-difference learning
H Yu, AR Mahmood, RS Sutton
(JMLR) The Journal of Machine Learning Research 19 (1), 1864-1912, 2018
422018
A new Q (λ) with interim forward view and Monte Carlo equivalence
RS Sutton, AR Mahmood, D Precup, M CA, H van Hasselt, U CA
(ICML) In International Conference on Machine Learning, 2014
402014
Emphatic temporal-difference learning
AR Mahmood, H Yu, M White, RS Sutton
In European Workshops on Reinforcement Learning, 2015
392015
Greedification operators for policy optimization: investigating forward and reverse KL divergences
A Chan, H Silva, S Lim, T Kozuno, AR Mahmood, M White
(JMLR) Journal of Machine Learning Research, 2022
352022
Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning
M Elsayed, AR Mahmood
(ICLR) The Twelfth International Conference on Learning Representations, 2024
32*2024
Off-policy learning based on weighted importance sampling with linear computational complexity
AR Mahmood, RS Sutton
(UAI) Proceedings of the 31st Conference on Uncertainty in Artificial …, 2015
322015
Maintaining plasticity in deep continual learning
S Dohare, JF Hernandez-Garcia, P Rahman, AR Mahmood, RS Sutton
arXiv preprint arXiv:2306.13812, 2023
292023
Autoregressive policies for continuous control deep reinforcement learning
D Korenkevych, AR Mahmood, G Vasan, J Bergstra
(IJCAI) Proceedings of the 28th International Joint Conference on Artificial …, 2019
292019
Incremental Off-policy Reinforcement Learning Algorithms
A Mahmood
University of Alberta, 2017
222017
המערכת אינה יכולה לבצע את הפעולה כעת. נסה שוב מאוחר יותר.
מאמרים 1–20