Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2562 | 2023 |
Gemma: Open models based on gemini research and technology G Team, T Mesnard, C Hardin, R Dadashi, S Bhupatiraju, S Pathak, ... arXiv preprint arXiv:2403.08295, 2024 | 1085* | 2024 |
What matters for on-policy deep actor-critic methods? a largescale study M Andrychowicz, A Raichuk, P Stanczyk, M Orsini, S Girgin, R Marinier, ... International Conference on Learning Representations (ICLR), 2021, 2021 | 501* | 2021 |
Gemma 2: Improving open language models at a practical size G Team, M Riviere, S Pathak, PG Sessa, C Hardin, S Bhupatiraju, ... arXiv preprint arXiv:2408.00118, 2024 | 395* | 2024 |
Acme: A research framework for distributed reinforcement learning MW Hoffman, B Shahriari, J Aslanides, G Barth-Maron, N Momchev, ... arXiv preprint arXiv:2006.00979, 2022 | 274 | 2022 |
Primal wasserstein imitation learning R Dadashi, L Hussenot, M Geist, O Pietquin International Conference on Learning Representations (ICLR), 2021, 2020 | 151 | 2020 |
What Matters for Adversarial Imitation Learning? M Orsini*, A Raichuk*, L Hussenot*, D Vincent, R Dadashi, S Girgin, ... NeurIPS 35th Conference on Neural Information Processing Systems, 2021 | 83 | 2021 |
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback P Roit, J Ferret, L Shani, R Aharoni, G Cideron, R Dadashi, M Geist, ... ACL 2023 Proceedings, forthcoming. Association for Computational Linguistics, 2023 | 74 | 2023 |
Warm: On the benefits of weight averaged reward models A Ramé, N Vieillard, L Hussenot, R Dadashi, G Cideron, O Bachem, ... arXiv preprint arXiv:2401.12187, 2024 | 61 | 2024 |
Offline Reinforcement Learning as Anti-Exploration S Rezaeifar, R Dadashi, N Vieillard, L Hussenot, O Bachem, O Pietquin, ... AAAI 2022, 2021 | 61 | 2021 |
Offline Reinforcement Learning with Pseudometric Learning R Dadashi, S Rezaeifar, N Vieillard, L Hussenot, O Pietquin, M Geist International Conference on Machine Learning (ICML), 2021, 2021 | 42 | 2021 |
Continuous Control with Action Quantization from Demonstrations R Dadashi*, L Hussenot*, D Vincent, S Girgin, A Raichuk, M Geist, ... International Conference on Machine Learning (ICML), 2022, 2021 | 36 | 2021 |
CopyCAT: Taking control of neural policies with constant attacks L Hussenot, M Geist, O Pietquin International Conference on Autonomous Agents and Multiagent Systems (AAMAS …, 2019 | 33 | 2019 |
Targeted attacks on deep reinforcement learning agents through adversarial observations L Hussenot, M Geist, O Pietquin ICML Workshop, 2020 | 23 | 2020 |
Hyperparameter Selection for Imitation Learning L Hussenot, M Andrychowicz, D Vincent, R Dadashi, A Raichuk, ... International Conference on Machine Learning (ICML), 2021, 2021 | 22 | 2021 |
Bond: Aligning llms with best-of-n distillation PG Sessa, R Dadashi, L Hussenot, J Ferret, N Vieillard, A Ramé, ... arXiv preprint arXiv:2407.14622, 2024 | 20 | 2024 |
Warp: On the benefits of weight averaged rewarded policies A Ramé, J Ferret, N Vieillard, R Dadashi, L Hussenot, PL Cedoz, ... arXiv preprint arXiv:2406.16768, 2024 | 16* | 2024 |
Rlds: an ecosystem to generate, share and use datasets in reinforcement learning S Ramos, S Girgin, L Hussenot, D Vincent, H Yakubovich, D Toyama, ... arXiv preprint arXiv:2111.02767, 2021 | 16 | 2021 |
Musicrl: Aligning music generation to human preferences G Cideron, S Girgin, M Verzetti, D Vincent, M Kastelic, Z Borsos, ... arXiv preprint arXiv:2402.04229, 2024 | 14 | 2024 |
Show me the Way: Intrinsic Motivation from Demonstrations L Hussenot, R Dadashi, M Geist, O Pietquin International Conference on Autonomous Agents and Multiagent Systems (AAMAS …, 2020 | 12 | 2020 |