Suivre
Arian Hosseini
Arian Hosseini
Mila
Adresse e-mail validée de umontreal.ca - Page d'accueil
Titre
Citée par
Citée par
Année
Learning to understand goal specifications by modelling reward
D Bahdanau, F Hill, J Leike, E Hughes, A Hosseini, P Kohli, ...
ICLR 2019, 2018
1722018
Fashion-Gen: The Generative Fashion Dataset and Challenge
N Rostamzadeh, S Hosseini, T Boquet, W Stokowiec, Y Zhang, C Jauvin, ...
arXiv preprint arXiv:1806.08317, 2018
1652018
Understanding by Understanding Not: Modeling Negation in Language Models
A Hosseini, S Reddy, D Bahdanau, RD Hjelm, A Sordoni, A Courville
NAACL 2021, 2021
882021
V-STaR: Training Verifiers for Self-Taught Reasoners
A Hosseini, X Yuan, N Malkin, A Courville, A Sordoni, R Agarwal
Conference paper at COLM 2024, 2024
622024
Generative Verifiers: Reward Modeling as Next-Token Prediction
L Zhang, A Hosseini, H Bansal, M Kazemi, A Kumar, R Agarwal
arXiv preprint arXiv:2408.15240, 2024
422024
Ordered memory
Y Shen, S Tan, A Hosseini, Z Lin, A Sordoni, AC Courville
Advances in Neural Information Processing Systems 32, 2019
292019
Commonsense mining as knowledge base completion? A study on the impact of novelty
S Jastrzębski, D Bahdanau, S Hosseini, M Noukhovitch, Y Bengio, ...
arXiv preprint arXiv:1804.09259, 2018
292018
Joint Prompt Optimization of Stacked LLMs using Variational Inference
A Sordoni, X Yuan, MA Côté, M Pereira, A Trischler, Z Xiao, A Hosseini, ...
Thirty-seventh Conference on Neural Information Processing Systems, 2023
262023
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
H Bansal, A Hosseini, R Agarwal, VQ Tran, M Kazemi
arXiv preprint arXiv:2408.16737, 2024
232024
On the Compositional Generalization Gap of In-Context Learning
A Hosseini, A Vani, D Bahdanau, A Sordoni, A Courville
arXiv preprint arXiv:2211.08473, 2022
212022
The N+ Implementation Details of RLHF with PPO: A Case Study on TL; DR Summarization
S Huang, M Noukhovitch, A Hosseini, K Rasul, W Wang, L Tunstall
Conference paper at COLM 2024, 2024
162024
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference
A Sordoni, X Yuan, MA Côté, M Pereira, A Trischler, Z Xiao, A Hosseini, ...
arXiv preprint arXiv:2306.12509, 2023
132023
Not All LLM Reasoners Are Created Equal
A Hosseini, A Sordoni, D Toyama, A Courville, R Agarwal
arXiv preprint arXiv:2410.01748, 2024
12024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
M Noukhovitch, S Huang, S Xhonneux, A Hosseini, R Agarwal, ...
arXiv preprint arXiv:2410.18252, 2024
2024
On the reproducibility of gradient-based Meta-Reinforcement Learning baselines
T Deleu, S Guiroy, S Hosseini
2018
Faster, More Efficient RLHF through Off-Policy Asynchronous Learning
M Noukhovitch, S Huang, S Xhonneux, A Hosseini, R Agarwal, ...
NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles …, 0
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–16