Systematic inequalities in language technology performance across the world's languages

D Blasi, A Anastasopoulos, G Neubig - arxiv preprint arxiv:2110.06733, 2021 - arxiv.org
Natural language processing (NLP) systems have become a central technology in
communication, education, medicine, artificial intelligence, and many other domains of …

[PDF][PDF] JW300: A wide-coverage parallel corpus for low-resource languages

Ž Agic, I Vulic - 2019 - repository.cam.ac.uk
Viable cross-lingual transfer critically depends on the availability of parallel texts. Shortage
of such resources imposes a development and evaluation bottleneck in multilingual …

Learning to recombine and resample data for compositional generalization

E Akyürek, AF Akyürek, J Andreas - arxiv preprint arxiv:2010.03706, 2020 - arxiv.org
Flexible neural sequence models outperform grammar-and automaton-based counterparts
on a variety of tasks. However, neural models perform poorly in settings requiring …

UniMorph 3.0: Universal Morphology

AD McCarthy, C Kirov, M Grella… - … of The 12th …, 2020 - research-collection.ethz.ch
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-
coverage instantiated normalized morphological paradigms for hundreds of diverse world …

The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

R Cotterell, C Kirov, J Sylak-Glassman… - arxiv preprint arxiv …, 2018 - arxiv.org
The CoNLL--SIGMORPHON 2018 shared task on supervised learning of morphological
generation featured data sets from 103 typologically diverse languages. Apart from …

The SIGMORPHON 2019 shared task: Morphological analysis in context and cross-lingual transfer for inflection

AD McCarthy, E Vylomova, S Wu, C Malaviya… - arxiv preprint arxiv …, 2019 - arxiv.org
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in
morphology examined transfer learning of inflection between 100 language pairs, as well as …

The Johns Hopkins University Bible corpus: 1600+ tongues for typological exploration

AD McCarthy, R Wicks, D Lewis, A Mueller… - Proceedings of the …, 2020 - aclanthology.org
We present findings from the creation of a massively parallel corpus in over 1600
languages, the Johns Hopkins University Bible Corpus (JHUBC). The corpus consists of …

Are all languages equally hard to language-model?

R Cotterell, SJ Mielke, J Eisner, B Roark - arxiv preprint arxiv:1806.03743, 2018 - arxiv.org
For general modeling methods applied to diverse languages, a natural question is: how well
should we expect our models to work on languages with differing typological profiles? In this …

What kind of language is hard to language-model?

SJ Mielke, R Cotterell, K Gorman, B Roark… - arxiv preprint arxiv …, 2019 - arxiv.org
How language-agnostic are current state-of-the-art NLP tools? Are there some types of
language that are easier to model with current methods? In prior work (Cotterell et al., 2018) …

Massively multilingual pronunciation modeling with WikiPron

JL Lee, LFE Ashby, ME Garza… - Proceedings of the …, 2020 - aclanthology.org
We introduce WikiPron, an open-source command-line tool for extracting pronunciation data
from Wiktionary, a collaborative multilingual online dictionary. We first describe the design …