Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1768 | 2023 |
Quality at a glance: An audit of web-crawled multilingual datasets J Kreutzer, I Caswell, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ... Transactions of the Association for Computational Linguistics 10, 50-72, 2022 | 151 | 2022 |
A few thousand translations go a long way! leveraging pre-trained models for african news translation DI Adelani, JO Alabi, A Fan, J Kreutzer, X Shen, M Reid, D Ruiter, ... arXiv preprint arXiv:2205.02022, 2022 | 47 | 2022 |
Quality at a glance: An audit of web-crawled multilingual datasets I Caswell, J Kreutzer, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ... arXiv e-prints, arXiv: 2103.12028, 2021 | 38 | 2021 |
BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022. doi: 10.48550 T Le Scao, A Fan, C Akiki, E Pavlick, S Ilic, D Hesslow, R Castagné, ... arXiv preprint arXiv.2211.05100 10, 0 | 23 | |
Bloom library: Multimodal datasets in 300+ languages for a variety of downstream tasks C Leong, J Nemecek, J Mansdorfer, A Filighera, A Owodunni, ... arXiv preprint arXiv:2210.14712, 2022 | 22 | 2022 |
Bibletts: a large, high-fidelity, multilingual, and uniquely african speech corpus J Meyer, DI Adelani, E Casanova, A Öktem, DWJ Weber, S Kabongo, ... arXiv preprint arXiv:2207.03546, 2022 | 20 | 2022 |
Documenting geographically and contextually diverse data sources: The bigscience catalogue of language data and resources A McMillan-Major, Z Alyafeai, S Biderman, K Chen, F De Toni, G Dupont, ... arXiv preprint arXiv:2201.10066, 2022 | 19 | 2022 |
Guyo Jarso, Oreen Yousuf, Andre Niyongabo Rubungo, Gilles Hacheme, Eric Peter Wairagala, Muhammad Umair Nasir D Adelani, J Alabi, A Fan, J Kreutzer, X Shen, M Reid, D Ruiter, D Klakow, ... | 16 | 2022 |
JWSign: A highly multilingual corpus of Bible translations for more diversity in sign language processing S Gueuwou, S Siake, C Leong, M Müller arXiv preprint arXiv:2311.10174, 2023 | 12 | 2023 |
Unified coincident optical and radar for recognition (UNICORN) 2008 dataset C Leong, T Rovito, O Mendoza-Schrock, C Menart, J Bowser, L Moore, ... 2020-04-15]. https://github. com/AFRL-RY/data-unicorn-2008, 2019 | 8 | 2019 |
Adapting to the low-resource double-bind: investigating low-compute methods on low-resource African languages C Leong, H Shandilya, BFP Dossou, AL Tonja, J Mathew, AH Omotayo, ... arXiv preprint arXiv:2303.16985, 2023 | 7 | 2023 |
Phone-ing it in: Towards flexible multi-modal language model training by phonetic representations of data C Leong, D Whitenack Proceedings of the 60th Annual Meeting of the Association for Computational …, 2022 | 6 | 2022 |
The ebible corpus: Data and model benchmarks for bible translation for low-resource languages V Akerman, D Baines, D Daspit, U Hermjakob, T Jang, C Leong, M Martin, ... arXiv preprint arXiv:2304.09919, 2023 | 4 | 2023 |
Enhancing multi-domain automatic short answer grading through an explainable neuro-symbolic pipeline F Künnecke, A Filighera, C Leong, T Steuer arXiv preprint arXiv:2403.01811, 2024 | 1 | 2024 |
Characterization of CNN classifier performance with respect to variation in optical contrast, using synthetic electro-optical data C Menart, C Leong, O Mendoza-Schrock, E Zelnio Automatic Target Recognition XXIX 10988, 143-153, 2019 | 1 | 2019 |
Documenting geographically and contextually diverse language data sources A McMillan-Major, F De Toni, Z Alyafeai, S Biderman, K Chen, G Dupont, ... Northern European Journal of Language Technology 10 (1), 50-77, 2024 | | 2024 |
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation D Ifeoluwa Adelani, J Oluwadara Alabi, A Fan, J Kreutzer, X Shen, M Reid, ... arXiv e-prints, arXiv: 2205.02022, 2022 | | 2022 |