Multilingual is not enough: BERT for Finnish

A Virtanen, J Kanerva, R Ilo, J Luoma… - arxiv preprint arxiv …, 2019 - arxiv.org
Deep learning-based language models pretrained on large unannotated text corpora have
been demonstrated to allow efficient transfer learning for natural language processing, with …

Universal dependencies v1: A multilingual treebank collection

J Nivre, MC De Marneffe, F Ginter… - Proceedings of the …, 2016 - aclanthology.org
Cross-linguistically consistent annotation is necessary for sound comparative evaluation
and cross-lingual learning experiments. It is also useful for multilingual system development …

[PDF][PDF] Universal Stanford dependencies: A cross-linguistic typology.

MC De Marneffe, T Dozat, N Silveira, K Haverinen… - LREC, 2014 - lrec-conf.org
Revisiting the now de facto standard Stanford dependency representation, we propose an
improved taxonomy to capture grammatical relations across languages, including …

Joint morphological and syntactic analysis for richly inflected languages

B Bohnet, J Nivre, I Boguslavsky, R Farkas… - Transactions of the …, 2013 - direct.mit.edu
Joint morphological and syntactic analysis has been proposed as a way of improving
parsing accuracy for richly inflected languages. Starting from a transition-based model for …

From the world to word order: Deriving biases in noun phrase order from statistical properties of the world

J Culbertson, M Schouwstra, S Kirby - Language, 2020 - muse.jhu.edu
The world's languages exhibit striking diversity. At the same time, recurring linguistic
patterns suggest the possibility that this diversity is shaped by features of human cognition …

FinEst BERT and CroSloEngual BERT: less is more in multilingual models

M Ulčar, M Robnik-Šikonja - … Conference, TSD 2020, Brno, Czech Republic …, 2020 - Springer
Large pretrained masked language models have become state-of-the-art solutions for many
NLP problems. The research has been mostly focused on English language, though. While …

A broad-coverage corpus for Finnish named entity recognition

J Luoma, M Oinonen, M Pyykönen… - Proceedings of the …, 2020 - aclanthology.org
We present a new manually annotated corpus for broad-coverage named entity recognition
for Finnish. Building on the original Universal Dependencies Finnish corpus of 754 …

Classifying online corporate reputation with machine learning: a study in the banking domain

A Rantanen, J Salminen, F Ginter, BJ Jansen - Internet Research, 2020 - emerald.com
Purpose User-generated social media comments can be a useful source of information for
understanding online corporate reputation. However, the manual classification of these …

Exploring predictive uncertainty and calibration in NLP: A study on the impact of method & data scarcity

D Ulmer, J Frellsen, C Hardmeier - arxiv preprint arxiv:2210.15452, 2022 - arxiv.org
We investigate the problem of determining the predictive confidence (or, conversely,
uncertainty) of a neural classifier through the lens of low-resource languages. By training …

IMST: A revisited Turkish dependency treebank

U Sulubacak, G Eryiğit… - … Conference on Turkic …, 2016 - researchportal.helsinki.fi
In this paper, we present a critical analysis of the dependency annotation framework used in
the METU-Sabancı Treebank (MST), and propose new annotation schemes that would …