Text algorithms in economics
This article provides an overview of the methods used for algorithmic text analysis in
economics, with a focus on three key contributions. First, we introduce methods for …
economics, with a focus on three key contributions. First, we introduce methods for …
A survey of multilingual neural machine translation
We present a survey on multilingual neural machine translation (MNMT), which has gained
a lot of traction in recent years. MNMT has been useful in improving translation quality as a …
a lot of traction in recent years. MNMT has been useful in improving translation quality as a …
Text embeddings by weakly-supervised contrastive pre-training
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …
wide range of tasks. The model is trained in a contrastive manner with weak supervision …
Beyond english-centric multilingual machine translation
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …
translation by training a single model able to translate between any pair of languages …
COMET: A neural framework for MT evaluation
We present COMET, a neural framework for training multilingual machine translation
evaluation models which obtains new state-of-the-art levels of correlation with human …
evaluation models which obtains new state-of-the-art levels of correlation with human …
Language-agnostic BERT sentence embedding
While BERT is an effective method for learning monolingual sentence embeddings for
semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019) …
semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019) …
VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …
Making monolingual sentence embeddings multilingual using knowledge distillation
We present an easy and efficient method to extend existing sentence embedding models to
new languages. This allows to create multilingual versions from previously monolingual …
new languages. This allows to create multilingual versions from previously monolingual …
Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation
Much recent progress in applications of machine learning models to NLP has been driven
by benchmarks that evaluate models across a wide variety of tasks. However, these broad …
by benchmarks that evaluate models across a wide variety of tasks. However, these broad …
The state and fate of linguistic diversity and inclusion in the NLP world
Language technologies contribute to promoting multilingualism and linguistic diversity
around the world. However, only a very small number of the over 7000 languages of the …
around the world. However, only a very small number of the over 7000 languages of the …