Transformers: State-of-the-Art Natural Language Processing T Wolf arXiv preprint arXiv:1910.03771, 2020 | 9549 | 2020 |
Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1782 | 2023 |
Datasets: A Community Library for Natural Language Processing Q Lhoest, A Villanova del Moral, Y Jernite, A Thakur, P von Platen, S Patil, ... Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021 | 609* | 2021 |
The bigscience roots corpus: A 1.6 tb composite multilingual dataset H Laurençon, L Saulnier, T Wang, C Akiki, A Villanova del Moral, ... Advances in Neural Information Processing Systems 35, 31809-31826, 2022 | 192 | 2022 |
Distributed deep learning in open collaborations M Diskin, A Bukhtiyarov, M Ryabinin, L Saulnier, A Sinitsin, D Popov, ... Advances in Neural Information Processing Systems 34, 7879-7897, 2021 | 59 | 2021 |
Croissant: A metadata format for ml-ready datasets M Akhtar, O Benjelloun, C Conforti, L Foschini, J Giner-Miguelez, ... Advances in Neural Information Processing Systems 37, 82133-82148, 2025 | 32 | 2025 |
Evaluate & evaluation on the hub: Better best practices for data and model measurements L Von Werra, L Tunstall, A Thakur, AS Luccioni, T Thrush, A Piktus, ... arXiv preprint arXiv:2210.01970, 2022 | 24 | 2022 |
Training transformers together A Borzunov, M Ryabinin, T Dettmers, Q Lhoest, L Saulnier, M Diskin, ... NeurIPS 2021 Competitions and Demonstrations Track, 335-342, 2022 | 11 | 2022 |
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages CC Emezue, S Gandhi, L Tunstall, A Abid, J Meyer, Q Lhoest, P Allen, ... arXiv preprint arXiv:2303.12582, 2023 | 1 | 2023 |
Actes de la conférence CAID 2020 F de Vieilleville, S May, A Lagrange, A Dupuis, R Ruiloba, FN Mboula, ... | | 2021 |