Learning to understand goal specifications by modelling reward D Bahdanau, F Hill, J Leike, E Hughes, A Hosseini, P Kohli, ... ICLR 2019, 2018 | 172 | 2018 |
Fashion-Gen: The Generative Fashion Dataset and Challenge N Rostamzadeh, S Hosseini, T Boquet, W Stokowiec, Y Zhang, C Jauvin, ... arXiv preprint arXiv:1806.08317, 2018 | 165 | 2018 |
Understanding by Understanding Not: Modeling Negation in Language Models A Hosseini, S Reddy, D Bahdanau, RD Hjelm, A Sordoni, A Courville NAACL 2021, 2021 | 88 | 2021 |
V-STaR: Training Verifiers for Self-Taught Reasoners A Hosseini, X Yuan, N Malkin, A Courville, A Sordoni, R Agarwal Conference paper at COLM 2024, 2024 | 62 | 2024 |
Generative Verifiers: Reward Modeling as Next-Token Prediction L Zhang, A Hosseini, H Bansal, M Kazemi, A Kumar, R Agarwal arXiv preprint arXiv:2408.15240, 2024 | 42 | 2024 |
Ordered memory Y Shen, S Tan, A Hosseini, Z Lin, A Sordoni, AC Courville Advances in Neural Information Processing Systems 32, 2019 | 29 | 2019 |
Commonsense mining as knowledge base completion? A study on the impact of novelty S Jastrzębski, D Bahdanau, S Hosseini, M Noukhovitch, Y Bengio, ... arXiv preprint arXiv:1804.09259, 2018 | 29 | 2018 |
Joint Prompt Optimization of Stacked LLMs using Variational Inference A Sordoni, X Yuan, MA Côté, M Pereira, A Trischler, Z Xiao, A Hosseini, ... Thirty-seventh Conference on Neural Information Processing Systems, 2023 | 26 | 2023 |
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling H Bansal, A Hosseini, R Agarwal, VQ Tran, M Kazemi arXiv preprint arXiv:2408.16737, 2024 | 23 | 2024 |
On the Compositional Generalization Gap of In-Context Learning A Hosseini, A Vani, D Bahdanau, A Sordoni, A Courville arXiv preprint arXiv:2211.08473, 2022 | 21 | 2022 |
The N+ Implementation Details of RLHF with PPO: A Case Study on TL; DR Summarization S Huang, M Noukhovitch, A Hosseini, K Rasul, W Wang, L Tunstall Conference paper at COLM 2024, 2024 | 16 | 2024 |
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference A Sordoni, X Yuan, MA Côté, M Pereira, A Trischler, Z Xiao, A Hosseini, ... arXiv preprint arXiv:2306.12509, 2023 | 13 | 2023 |
Not All LLM Reasoners Are Created Equal A Hosseini, A Sordoni, D Toyama, A Courville, R Agarwal arXiv preprint arXiv:2410.01748, 2024 | 1 | 2024 |
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models M Noukhovitch, S Huang, S Xhonneux, A Hosseini, R Agarwal, ... arXiv preprint arXiv:2410.18252, 2024 | | 2024 |
On the reproducibility of gradient-based Meta-Reinforcement Learning baselines T Deleu, S Guiroy, S Hosseini | | 2018 |
Faster, More Efficient RLHF through Off-Policy Asynchronous Learning M Noukhovitch, S Huang, S Xhonneux, A Hosseini, R Agarwal, ... NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles …, 0 | | |