A survey of the usages of deep learning for natural language processing

DW Otter, JR Medina, JK Kalita - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org
Over the last several years, the field of natural language processing has been propelled
forward by an explosion in the use of deep learning models. This article provides a brief …

Universal dependencies

MC De Marneffe, CD Manning, J Nivre… - Computational …, 2021 - direct.mit.edu
Universal dependencies (UD) is a framework for morphosyntactic annotation of human
language, which to date has been used to create treebanks for more than 100 languages. In …

Universal Dependencies v2: An evergrowing multilingual treebank collection

J Nivre, MC De Marneffe, F Ginter, J Hajič… - arxiv preprint arxiv …, 2020 - arxiv.org
Universal Dependencies is an open community effort to create cross-linguistically consistent
treebank annotation for many languages within a dependency-based lexicalist framework …

How multilingual is multilingual BERT?

T Pires, E Schlinger, D Garrette - arxiv preprint arxiv:1906.01502, 2019 - arxiv.org
In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al.(2018) as
a single language model pre-trained from monolingual corpora in 104 languages, is …

FLAIR: An easy-to-use framework for state-of-the-art NLP

A Akbik, T Bergmann, D Blythe, K Rasul… - Proceedings of the …, 2019 - aclanthology.org
We present FLAIR, an NLP framework designed to facilitate training and distribution of state-
of-the-art sequence labeling, text classification and language models. The core idea of the …

Machine learning for ancient languages: A survey

T Sommerschield, Y Assael, J Pavlopoulos… - Computational …, 2023 - direct.mit.edu
Ancient languages preserve the cultures and histories of the past. However, their study is
fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from …

IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP

F Koto, A Rahimi, JH Lau, T Baldwin - arxiv preprint arxiv:2011.00677, 2020 - arxiv.org
Although the Indonesian language is spoken by almost 200 million people and the 10th
most spoken language in the world, it is under-represented in NLP research. Previous work …

Multilingual is not enough: BERT for Finnish

A Virtanen, J Kanerva, R Ilo, J Luoma… - arxiv preprint arxiv …, 2019 - arxiv.org
Deep learning-based language models pretrained on large unannotated text corpora have
been demonstrated to allow efficient transfer learning for natural language processing, with …

Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe

M Straka, J Straková - Proceedings of the CoNLL 2017 shared …, 2017 - aclanthology.org
Many natural language processing tasks, including the most advanced ones, routinely start
by several basic processing steps–tokenization and segmentation, most likely also POS …

Linguistically-informed self-attention for semantic role labeling

E Strubell, P Verga, D Andor, D Weiss… - arxiv preprint arxiv …, 2018 - arxiv.org
Current state-of-the-art semantic role labeling (SRL) uses a deep neural network with no
explicit linguistic features. However, prior work has shown that gold syntax trees can …