Rlaif: Scaling reinforcement learning from human feedback with ai feedback H Lee, S Phatale, H Mansoor, KR Lu, T Mesnard, J Ferret, C Bishop, ... | 488 | 2023 |
LLMs cannot find reasoning errors, but can correct them given the error location G Tyen, H Mansoor, V Cărbune, P Chen, T Mak arXiv preprint arXiv:2311.08516, 2023 | 77 | 2023 |
Rlaif vs. rlhf: Scaling reinforcement learning from human feedback with ai feedback H Lee, S Phatale, H Mansoor, T Mesnard, J Ferret, K Lu, C Bishop, E Hall, ... arXiv preprint arXiv:2309.00267, 2023 | 68 | 2023 |
Screenai: A vision-language model for ui and infographics understanding G Baechler, S Sunkara, M Wang, F Zubach, H Mansoor, V Etter, ... arXiv preprint arXiv:2402.04615, 2024 | 45 | 2024 |
Chart-based reasoning: Transferring capabilities from LLMs to VLMs V Carbune, H Mansoor, F Liu, R Aralikatte, G Baechler, J Chen, A Sharma arXiv preprint arXiv:2403.12596, 2024 | 10 | 2024 |
Methods and systems for predicting conversion rates of content publisher and content provider pairs R Kirillov, H Mansoor US Patent 9,246,990, 2016 | 10 | 2016 |
RLAIF: Scaling reinforcement learning from human feedback with AI feedback, 2024 H Lee, S Phatale, H Mansoor, KR Lu, T Mesnard, J Ferret, C Bishop, ... URL https://openreview. net/forum, 0 | 8 | |
Perl: Parameter efficient reinforcement learning from human feedback H Sidahmed, S Phatale, A Hutcheson, Z Lin, Z Chen, Z Yu, J Jin, ... arXiv e-prints, arXiv: 2403.10704, 2024 | 7 | 2024 |
Methods and systems for providing an actionable object within a third-party content slot of an information resource of a content publisher R Kirillov, A Tyler, D Banfield, H Mansoor, DM Goodridge, LA Collard US Patent 9,461,936, 2016 | 7 | 2016 |
Methods and systems for providing an actionable object within a third-party content slot of an information resource of a content publisher R Kirillov, A Tyler, D Banfield, H Mansoor, DM Goodridge, LA Collard US Patent 10,067,916, 2018 | 6 | 2018 |
Llms cannot find reasoning errors, but can correct them given the error location, 2024 G Tyen, H Mansoor, V Carbune, P Chen, T Mak URL https://arxiv. org/abs/2311.08516, 0 | 4 | |
AuPair: Golden Example Pairs for Code Repair A Mavalankar, H Mansoor, Z Marinho, M Samsikova, T Schaul arXiv preprint arXiv:2502.18487, 2025 | | 2025 |
VQA Training Sets are Self-play Environments for Generating Few-shot Pools T Misiunas, H Mansoor, J Uijlings, O Riva, V Carbune arXiv preprint arXiv:2405.19773, 2024 | | 2024 |
Parameter Efficient Reinforcement Learning from Human Feedback H Sidahmed, S Phatale, A Hutcheson, Z Lin, Z Chen, Z Yu, J Jin, ... arXiv preprint arXiv:2403.10704, 2024 | | 2024 |
The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization S Gooding, H Mansoor arXiv preprint arXiv:2311.04919, 2023 | | 2023 |
Methods and systems for providing an actionable object within a third-party content slot of an information resource of a content publisher R Kirillov, A Tyler, D Banfield, H Mansoor, DM Goodridge, LA Collard US Patent 10,210,140, 2019 | | 2019 |