On the linguistic representational power of neural machine translation models

Y Belinkov, N Durrani, F Dalvi, H Sajjad… - Computational …, 2020 - direct.mit.edu
Despite the recent success of deep neural networks in natural language processing and
other spheres of artificial intelligence, their interpretability remains a challenge. We analyze …

Are all languages created equal in multilingual BERT?

S Wu, M Dredze - arxiv preprint arxiv:2005.09093, 2020 - arxiv.org
Multilingual BERT (mBERT) trained on 104 languages has shown surprisingly good cross-
lingual performance on several NLP tasks, even without explicit cross-lingual signals …

UDPipe 2.0 prototype at CoNLL 2018 UD shared task

M Straka - Proceedings of the CoNLL 2018 shared task …, 2018 - aclanthology.org
UDPipe is a trainable pipeline which performs sentence segmentation, tokenization, POS
tagging, lemmatization and dependency parsing. We present a prototype for UDPipe 2.0 …

[PDF][PDF] JW300: A wide-coverage parallel corpus for low-resource languages

Ž Agic, I Vulic - 2019 - repository.cam.ac.uk
Viable cross-lingual transfer critically depends on the availability of parallel texts. Shortage
of such resources imposes a development and evaluation bottleneck in multilingual …

Small and practical BERT models for sequence labeling

H Tsai, J Riesa, M Johnson, N Arivazhagan… - arxiv preprint arxiv …, 2019 - arxiv.org
We propose a practical scheme to train a single multilingual sequence labeling model that
yields state of the art results and is small and fast enough to run on a single CPU. Starting …

A primer on pretrained multilingual language models

S Doddapaneni, G Ramesh, MM Khapra… - arxiv preprint arxiv …, 2021 - arxiv.org
Multilingual Language Models (\MLLMs) such as mBERT, XLM, XLM-R,\textit {etc.} have
emerged as a viable option for bringing the power of pretraining to a large number of …

English intermediate-task training improves zero-shot cross-lingual transfer too

J Phang, I Calixto, PM Htut, Y Pruksachatkun… - arxiv preprint arxiv …, 2020 - arxiv.org
Intermediate-task training---fine-tuning a pretrained model on an intermediate task before
fine-tuning again on the target task---often improves model performance substantially on …

Specializing word embeddings (for parsing) by information bottleneck

XL Li, J Eisner - arxiv preprint arxiv:1910.00163, 2019 - arxiv.org
Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic
information, resulting in state-of-the-art performance on various tasks. We propose a very …

When is BERT multilingual? isolating crucial ingredients for cross-lingual transfer

A Deshpande, P Talukdar, K Narasimhan - arxiv preprint arxiv …, 2021 - arxiv.org
While recent work on multilingual language models has demonstrated their capacity for
cross-lingual zero-shot transfer on downstream tasks, there is a lack of consensus in the …

Viable dependency parsing as sequence labeling

M Strzyz, D Vilares, C Gómez-Rodríguez - arxiv preprint arxiv:1902.10505, 2019 - arxiv.org
We recast dependency parsing as a sequence labeling problem, exploring several
encodings of dependency trees as labels. While dependency parsing by means of …