Ammus: A survey of transformer-based pretrained models in natural language processing
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …
almost every NLP task. The evolution of these models started with GPT and BERT. These …
The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
One of the biggest challenges hindering progress in low-resource and multilingual machine
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …
Beyond english-centric multilingual machine translation
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …
translation by training a single model able to translate between any pair of languages …
Documenting large webtext corpora: A case study on the colossal clean crawled corpus
Large language models have led to remarkable progress on many NLP tasks, and
researchers are turning to ever-larger text corpora to train them. Some of the largest corpora …
researchers are turning to ever-larger text corpora to train them. Some of the largest corpora …
Deepnet: Scaling transformers to 1,000 layers
In this paper, we propose a simple yet effective method to stabilize extremely deep
Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify …
Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify …
Findings of the 2021 conference on machine translation (WMT21)
This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …
translation for Indo-European languages, the triangular translation task, and the automatic …
Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia
We present an approach based on multilingual sentence embeddings to automatically
extract parallel sentences from the content of Wikipedia articles in 85 languages, including …
extract parallel sentences from the content of Wikipedia articles in 85 languages, including …
Language varieties of Italy: Technology challenges and opportunities
Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …
Pangu-{\Sigma}: Towards trillion parameter language model with sparse heterogeneous computing
The scaling of large language models has greatly improved natural language
understanding, generation, and reasoning. In this work, we develop a system that trained a …
understanding, generation, and reasoning. In this work, we develop a system that trained a …
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages
We present Samanantar, the largest publicly available parallel corpora collection for Indic
languages. The collection contains a total of 49.7 million sentence pairs between English …
languages. The collection contains a total of 49.7 million sentence pairs between English …