دنبال کردن
Hiteshi Sharma
Hiteshi Sharma
ایمیل تأیید شده در microsoft.com
عنوان
نقل شده توسط
نقل شده توسط
سال
Phi-3 technical report: A highly capable language model locally on your phone
M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ...
arXiv preprint arXiv:2404.14219, 2024
9062024
Model-free reinforcement learning in infinite-horizon average-reward markov decision processes
CY Wei, MJ Jahromi, H Luo, H Sharma, R Jain
International conference on machine learning, 10170-10180, 2020
1232020
Evaluating cognitive maps and planning in large language models with cogeval
I Momennejad, H Hasanbeig, F Vieira Frujeri, H Sharma, N Jojic, ...
Advances in Neural Information Processing Systems 36, 69736-69751, 2023
63*2023
Fine-tuning language models with advantage-induced policy alignment
B Zhu, H Sharma, FV Frujeri, S Dong, C Zhu, MI Jordan, J Jiao
arXiv preprint arXiv:2306.02231, 2023
412023
Self-exploring language models: Active preference elicitation for online alignment
S Zhang, D Yu, H Sharma, H Zhong, Z Liu, Z Yang, S Wang, H Hassan, ...
arXiv preprint arXiv:2405.19332, 2024
272024
Language models can be logical solvers
J Feng, R Xu, J Hao, H Sharma, Y Shen, D Zhao, W Chen
Findings of the Association for Computational Linguistics: NAACL 2024, 2023
22*2023
A universal empirical dynamic programming algorithm for continuous state MDPs
WB Haskell, R Jain, H Sharma, P Yu
IEEE Transactions on Automatic Control 65 (1), 115-129, 2019
212019
Allure: A systematic protocol for auditing and improving llm-based evaluation of text using iterative in-context-learning
H Hasanbeig, H Sharma, L Betthauser, FV Frujeri, I Momennejad
arXiv preprint arXiv:2309.13701 3, 2023
172023
Approximate relative value learning for average-reward continuous state MDPs
H Sharma, M Jafarnia-Jahromi, R Jain
Uncertainty in Artificial Intelligence, 956-964, 2020
162020
An empirical relative value learning algorithm for non-parametric MDPs with continuous state space
H Sharma, R Jain, A Gupta
2019 18th European Control Conference (ECC), 1368-1373, 2019
132019
Randomized function fitting-based empirical value iteration
WB Haskell, P Yu, H Sharma, R Jain
2017 IEEE 56th Annual Conference on Decision and Control (CDC), 2467-2472, 2017
92017
An approximately optimal relative value learning algorithm for averaged MDPs with continuous states and actions
H Sharma, R Jain
2019 57th Annual Allerton Conference on Communication, Control, and …, 2019
72019
Phi-3 safety post-training: Aligning language models with a" break-fix" cycle
E Haider, D Perez-Becker, T Portet, P Madan, A Garg, A Ashfaq, ...
arXiv preprint arXiv:2407.13833, 2024
52024
Optimal spectrum sensing for cognitive radio with imperfect detector
H Sharma, A Patel, SN Merchant, UB Desai
2014 IEEE 79th Vehicular Technology Conference (VTC Spring), 1-5, 2014
42014
An empirical dynamic programming algorithm for continuous MDPs
WB Haskell, R Jain, H Sharma, P Yu
arXiv preprint arXiv:1709.07506, 2017
32017
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Y Chen, S Wang, Z Yang, H Sharma, N Karampatziakis, D Yu, ...
arXiv preprint arXiv:2407.02119, 2024
12024
Finite Time Guarantees for Continuous State MDPs with Generative Model
H Sharma, R Jain
2020 59th IEEE Conference on Decision and Control (CDC), 3617-3622, 2020
12020
Randomized Policy Learning for Continuous State and Action MDPs
H Sharma, R Jain
arXiv preprint arXiv:2006.04331, 2020
12020
Empirical algorithms for general stochastic systems with continuous states and actions
H Sharma, R Jain, W Haskell
2019 IEEE 58th Conference on Decision and Control (CDC), 6344-6349, 2019
12019
QoS aware optimal base station ON/OFF policy and frequency planning
H Sharma, V Vaid, P Chaporkar, GS Kasbekar
Indian Inst. Technol. Bombay, 2015
12015
سیستم در حال حاضر قادر به انجام عملکرد نیست. بعداً دوباره امتحان کنید.
مقاله‌ها 1–20