Prati
Colin Leong
Naslov
Citirano
Citirano
Godina
Bloom: A 176b-parameter open-access multilingual language model
T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ...
17682023
Quality at a glance: An audit of web-crawled multilingual datasets
J Kreutzer, I Caswell, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ...
Transactions of the Association for Computational Linguistics 10, 50-72, 2022
1512022
A few thousand translations go a long way! leveraging pre-trained models for african news translation
DI Adelani, JO Alabi, A Fan, J Kreutzer, X Shen, M Reid, D Ruiter, ...
arXiv preprint arXiv:2205.02022, 2022
472022
Quality at a glance: An audit of web-crawled multilingual datasets
I Caswell, J Kreutzer, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ...
arXiv e-prints, arXiv: 2103.12028, 2021
382021
BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022. doi: 10.48550
T Le Scao, A Fan, C Akiki, E Pavlick, S Ilic, D Hesslow, R Castagné, ...
arXiv preprint arXiv.2211.05100 10, 0
23
Bloom library: Multimodal datasets in 300+ languages for a variety of downstream tasks
C Leong, J Nemecek, J Mansdorfer, A Filighera, A Owodunni, ...
arXiv preprint arXiv:2210.14712, 2022
222022
Bibletts: a large, high-fidelity, multilingual, and uniquely african speech corpus
J Meyer, DI Adelani, E Casanova, A Öktem, DWJ Weber, S Kabongo, ...
arXiv preprint arXiv:2207.03546, 2022
202022
Documenting geographically and contextually diverse data sources: The bigscience catalogue of language data and resources
A McMillan-Major, Z Alyafeai, S Biderman, K Chen, F De Toni, G Dupont, ...
arXiv preprint arXiv:2201.10066, 2022
192022
Guyo Jarso, Oreen Yousuf, Andre Niyongabo Rubungo, Gilles Hacheme, Eric Peter Wairagala, Muhammad Umair Nasir
D Adelani, J Alabi, A Fan, J Kreutzer, X Shen, M Reid, D Ruiter, D Klakow, ...
162022
JWSign: A highly multilingual corpus of Bible translations for more diversity in sign language processing
S Gueuwou, S Siake, C Leong, M Müller
arXiv preprint arXiv:2311.10174, 2023
122023
Unified coincident optical and radar for recognition (UNICORN) 2008 dataset
C Leong, T Rovito, O Mendoza-Schrock, C Menart, J Bowser, L Moore, ...
2020-04-15]. https://github. com/AFRL-RY/data-unicorn-2008, 2019
82019
Adapting to the low-resource double-bind: investigating low-compute methods on low-resource African languages
C Leong, H Shandilya, BFP Dossou, AL Tonja, J Mathew, AH Omotayo, ...
arXiv preprint arXiv:2303.16985, 2023
72023
Phone-ing it in: Towards flexible multi-modal language model training by phonetic representations of data
C Leong, D Whitenack
Proceedings of the 60th Annual Meeting of the Association for Computational …, 2022
62022
The ebible corpus: Data and model benchmarks for bible translation for low-resource languages
V Akerman, D Baines, D Daspit, U Hermjakob, T Jang, C Leong, M Martin, ...
arXiv preprint arXiv:2304.09919, 2023
42023
Enhancing multi-domain automatic short answer grading through an explainable neuro-symbolic pipeline
F Künnecke, A Filighera, C Leong, T Steuer
arXiv preprint arXiv:2403.01811, 2024
12024
Characterization of CNN classifier performance with respect to variation in optical contrast, using synthetic electro-optical data
C Menart, C Leong, O Mendoza-Schrock, E Zelnio
Automatic Target Recognition XXIX 10988, 143-153, 2019
12019
Documenting geographically and contextually diverse language data sources
A McMillan-Major, F De Toni, Z Alyafeai, S Biderman, K Chen, G Dupont, ...
Northern European Journal of Language Technology 10 (1), 50-77, 2024
2024
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
D Ifeoluwa Adelani, J Oluwadara Alabi, A Fan, J Kreutzer, X Shen, M Reid, ...
arXiv e-prints, arXiv: 2205.02022, 2022
2022
Sustav trenutno ne može provesti ovu radnju. Pokušajte ponovo kasnije.
Članci 1–18