Phi-3 technical report: A highly capable language model locally on your phone M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ... arXiv preprint arXiv:2404.14219, 2024 | 906 | 2024 |
Model-free reinforcement learning in infinite-horizon average-reward markov decision processes CY Wei, MJ Jahromi, H Luo, H Sharma, R Jain International conference on machine learning, 10170-10180, 2020 | 123 | 2020 |
Evaluating cognitive maps and planning in large language models with cogeval I Momennejad, H Hasanbeig, F Vieira Frujeri, H Sharma, N Jojic, ... Advances in Neural Information Processing Systems 36, 69736-69751, 2023 | 63* | 2023 |
Fine-tuning language models with advantage-induced policy alignment B Zhu, H Sharma, FV Frujeri, S Dong, C Zhu, MI Jordan, J Jiao arXiv preprint arXiv:2306.02231, 2023 | 41 | 2023 |
Self-exploring language models: Active preference elicitation for online alignment S Zhang, D Yu, H Sharma, H Zhong, Z Liu, Z Yang, S Wang, H Hassan, ... arXiv preprint arXiv:2405.19332, 2024 | 27 | 2024 |
Language models can be logical solvers J Feng, R Xu, J Hao, H Sharma, Y Shen, D Zhao, W Chen Findings of the Association for Computational Linguistics: NAACL 2024, 2023 | 22* | 2023 |
A universal empirical dynamic programming algorithm for continuous state MDPs WB Haskell, R Jain, H Sharma, P Yu IEEE Transactions on Automatic Control 65 (1), 115-129, 2019 | 21 | 2019 |
Allure: A systematic protocol for auditing and improving llm-based evaluation of text using iterative in-context-learning H Hasanbeig, H Sharma, L Betthauser, FV Frujeri, I Momennejad arXiv preprint arXiv:2309.13701 3, 2023 | 17 | 2023 |
Approximate relative value learning for average-reward continuous state MDPs H Sharma, M Jafarnia-Jahromi, R Jain Uncertainty in Artificial Intelligence, 956-964, 2020 | 16 | 2020 |
An empirical relative value learning algorithm for non-parametric MDPs with continuous state space H Sharma, R Jain, A Gupta 2019 18th European Control Conference (ECC), 1368-1373, 2019 | 13 | 2019 |
Randomized function fitting-based empirical value iteration WB Haskell, P Yu, H Sharma, R Jain 2017 IEEE 56th Annual Conference on Decision and Control (CDC), 2467-2472, 2017 | 9 | 2017 |
An approximately optimal relative value learning algorithm for averaged MDPs with continuous states and actions H Sharma, R Jain 2019 57th Annual Allerton Conference on Communication, Control, and …, 2019 | 7 | 2019 |
Phi-3 safety post-training: Aligning language models with a" break-fix" cycle E Haider, D Perez-Becker, T Portet, P Madan, A Garg, A Ashfaq, ... arXiv preprint arXiv:2407.13833, 2024 | 5 | 2024 |
Optimal spectrum sensing for cognitive radio with imperfect detector H Sharma, A Patel, SN Merchant, UB Desai 2014 IEEE 79th Vehicular Technology Conference (VTC Spring), 1-5, 2014 | 4 | 2014 |
An empirical dynamic programming algorithm for continuous MDPs WB Haskell, R Jain, H Sharma, P Yu arXiv preprint arXiv:1709.07506, 2017 | 3 | 2017 |
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning Y Chen, S Wang, Z Yang, H Sharma, N Karampatziakis, D Yu, ... arXiv preprint arXiv:2407.02119, 2024 | 1 | 2024 |
Finite Time Guarantees for Continuous State MDPs with Generative Model H Sharma, R Jain 2020 59th IEEE Conference on Decision and Control (CDC), 3617-3622, 2020 | 1 | 2020 |
Randomized Policy Learning for Continuous State and Action MDPs H Sharma, R Jain arXiv preprint arXiv:2006.04331, 2020 | 1 | 2020 |
Empirical algorithms for general stochastic systems with continuous states and actions H Sharma, R Jain, W Haskell 2019 IEEE 58th Conference on Decision and Control (CDC), 6344-6349, 2019 | 1 | 2019 |
QoS aware optimal base station ON/OFF policy and frequency planning H Sharma, V Vaid, P Chaporkar, GS Kasbekar Indian Inst. Technol. Bombay, 2015 | 1 | 2015 |