Cross-lingual language model pretraining
Recent studies have demonstrated the efficiency of generative pretraining for English
natural language understanding. In this work, we extend this approach to multiple …
natural language understanding. In this work, we extend this approach to multiple …
Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond
We introduce an architecture to learn joint multilingual sentence representations for 93
languages, belonging to more than 30 different families and written in 28 different scripts …
languages, belonging to more than 30 different families and written in 28 different scripts …
Offline bilingual word vectors, orthogonal transformations and the inverted softmax
Usually bilingual word vectors are trained" online". Mikolov et al. showed they can also be
found" offline", whereby two pre-trained embeddings are aligned with a linear …
found" offline", whereby two pre-trained embeddings are aligned with a linear …
SimAlign: High quality word alignments without parallel training data using static and contextualized embeddings
Word alignments are useful for tasks like statistical and neural machine translation (NMT)
and cross-lingual annotation projection. Statistical word aligners perform well, as do …
and cross-lingual annotation projection. Statistical word aligners perform well, as do …
From word to sense embeddings: A survey on vector representations of meaning
Over the past years, distributed semantic representations have proved to be effective and
flexible keepers of prior knowledge to be integrated into downstream applications. This …
flexible keepers of prior knowledge to be integrated into downstream applications. This …