Dialect-to-standard normalization: A large-scale multilingual evaluation

O Kuparinen, AM Haddad… - Conference on Empirical …, 2023 - researchportal.helsinki.fi
Text normalization methods have been commonly applied to historical language or user-
generated content, but less often to dialectal transcriptions. In this paper, we introduce …

Finnish dialect identification: The effect of audio and text

M Hämäläinen, K Alnajjar, N Partanen… - arxiv preprint arxiv …, 2021 - arxiv.org
Finnish is a language with multiple dialects that not only differ from each other in terms of
accent (pronunciation) but also in terms of morphological forms and lexical choice. We …

Murreviikko-a dialectologically annotated and normalized dataset of Finnish tweets

O Kuparinen - Tenth Workshop on NLP for Similar Languages …, 2023 - aclanthology.org
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been
dialectologically annotated and manually normalized to a standard form. The dataset can be …

Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

K Alnajjar, M Hämäläinen, J Rueter… - arxiv preprint arxiv …, 2020 - arxiv.org
We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to
re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur …

[PDF][PDF] Rules ruling neural networks–neural vs. rule-based grammar checking for a low resource language

L Wiechetek, F Pirinen… - Recent Advances in …, 2021 - researchportal.helsinki.fi
We investigate both rule-based and machine learning methods for the task of compound
error correction and evaluate their efficiency for North Sámi, a low resource language. The …

The current state of Finnish NLP

M Hämäläinen, K Alnajjar - arxiv preprint arxiv:2109.11326, 2021 - arxiv.org
There are a lot of tools and resources available for processing Finnish. In this paper, we
survey recent papers focusing on Finnish NLP related to many different subcategories of …

Lemmatization of historical old literary Finnish texts in modern orthography

M Hämäläinen, N Partanen, K Alnajjar - arxiv preprint arxiv:2107.03266, 2021 - arxiv.org
Texts written in Old Literary Finnish represent the first literary work ever written in Finnish
starting from the 16th century. There have been several projects in Finland that have …

Help from the neighbors: Estonian dialect normalization using a Finnish dialect generator

M Hämäläinen, K Alnajjar, T Tuisk - Proceedings of the Third …, 2022 - aclanthology.org
While standard Estonian is not a low-resourced language, the different dialects of the
language are under-resourced from the point of view of NLP, given that there are no vast …

Computational exploration of the origin of mood in literary texts

E Öhman, RH Rossi - … of the 2nd International Workshop on …, 2022 - aclanthology.org
This paper is a methodological exploration of the origin of mood in early modern and
modern Finnish literary texts using computational methods. We discuss the pre-processing …

From plenipotentiary to puddingless: Users and uses of new words in early English letters

T Säily, E Mäkelä, M Hämäläinen - arxiv preprint arxiv:2103.09926, 2021 - arxiv.org
We study neologism use in two samples of early English correspondence, from 1640--1660
and 1760--1780. Of especial interest are the early adopters of new vocabulary, the social …