ParaCrawl: Web-Scale Acquisition of Parallel Corpora M Bañón, P Chen, B Haddow, K Heafield, H Hoang, M Espla-Gomis, ... | 274* | |
Prompsit’s submission to WMT 2018 Parallel Corpus Filtering shared task VM Sánchez-Cartagena, M Bañón, SO Rojas, G Ramírez-Sánchez Proceedings of the Third Conference on Machine Translation: Shared Task …, 2018 | 65 | 2018 |
Bifixer and Bicleaner: two open-source tools to clean your parallel data MBSOR Gema Ramírez-Sánchez, Jaume Zaragoza-Bernabeu Proceedings of the 22nd Annual Conference of the European Association for …, 2020 | 46 | 2020 |
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages M Bañón, M Esplà-Gomis, ML Forcada, C García-Romero, T Kuzman, ... Proceedings of the 23rd Annual Conference of the European Association for …, 2022 | 22 | 2022 |
Bicleaner AI: Bicleaner Goes Neural J Zaragoza-Bernabeu, G Ramírez‐Sánchez, M Bañón, S Ortiz-Rojas Proceedings of the Thirteenth Language Resources and Evaluation Conference …, 2022 | 13 | 2022 |
A New Massive Multilingual Dataset for High-Performance Language Technologies O de Gibert, G Nail, N Arefyev, M Bañón, J van der Linde, S Ji, ... arXiv preprint arXiv:2403.14009, 2024 | 12 | 2024 |
Slovene-English parallel corpus MaCoCu-sl-en 1.0 M Bañón, M Esplà-Gomis, ML Forcada, C García-Romero, T Kuzman, ... Jožef Stefan Institute, 2022 | 5 | 2022 |
ParaCrawl corpus version 1.0 P Koehn, K Heafield, ML Forcada, M Espla-Gomis, S Ortiz-Rojas, ... LINDAT/CLARIN digital library at the Institute of Formal and Applied …, 2018 | 5 | 2018 |
Croatian web corpus MaCoCu-hr 2.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 4* | 2023 |
FastSpell: the LangId Magic Spell M Bañón, J Zaragoza-Bernabeu, G Ramírez-Sánchez, S Ortiz-Rojas arXiv preprint arXiv:2404.08345, 2024 | 3 | 2024 |
Montenegrin web corpus MaCoCu-cnr 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 2* | 2023 |
Serbian web corpus MaCoCu-sr 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | 2 | 2023 |
Human evaluation of web-crawled parallel corpora for machine translation G Ramírez‐Sánchez, M Bañón, J Zaragoza-Bernabeu, S Ortíz-Rojas Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval …, 2022 | 2 | 2022 |
Bulgarian-English parallel corpus MaCoCu-bg-en 1.0 M Bañón, M Esplà-Gomis, ML Forcada, C García-Romero, T Kuzman, ... Jožef Stefan Institute, 2022 | 1 | 2022 |
Macedonian-English parallel corpus MaCoCu-mk-en 1.0 M Bañón, M Esplà-Gomis, ML Forcada, C García-Romero, T Kuzman, ... Jožef Stefan Institute, 2022 | 1 | 2022 |
Catalan web corpus MaCoCu-ca 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |
Greek web corpus MaCoCu-el 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |
Ukrainian web corpus MaCoCu-uk 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |
Albanian-English parallel corpus MaCoCu-sq-en 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |
Bosnian-English parallel corpus MaCoCu-bs-en 1.0 M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ... Jožef Stefan Institute, 2023 | | 2023 |