Dialect-to-standard normalization: A large-scale multilingual evaluation
O Kuparinen, AM Haddad… - Conference on Empirical …, 2023 - researchportal.helsinki.fi
Text normalization methods have been commonly applied to historical language or user-
generated content, but less often to dialectal transcriptions. In this paper, we introduce …
generated content, but less often to dialectal transcriptions. In this paper, we introduce …
Finnish dialect identification: The effect of audio and text
Finnish is a language with multiple dialects that not only differ from each other in terms of
accent (pronunciation) but also in terms of morphological forms and lexical choice. We …
accent (pronunciation) but also in terms of morphological forms and lexical choice. We …
Murreviikko-a dialectologically annotated and normalized dataset of Finnish tweets
O Kuparinen - Tenth Workshop on NLP for Similar Languages …, 2023 - aclanthology.org
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been
dialectologically annotated and manually normalized to a standard form. The dataset can be …
dialectologically annotated and manually normalized to a standard form. The dataset can be …
Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement
We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to
re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur …
re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur …
[PDF][PDF] Rules ruling neural networks–neural vs. rule-based grammar checking for a low resource language
L Wiechetek, F Pirinen… - Recent Advances in …, 2021 - researchportal.helsinki.fi
We investigate both rule-based and machine learning methods for the task of compound
error correction and evaluate their efficiency for North Sámi, a low resource language. The …
error correction and evaluate their efficiency for North Sámi, a low resource language. The …
The current state of Finnish NLP
There are a lot of tools and resources available for processing Finnish. In this paper, we
survey recent papers focusing on Finnish NLP related to many different subcategories of …
survey recent papers focusing on Finnish NLP related to many different subcategories of …
Lemmatization of historical old literary Finnish texts in modern orthography
Texts written in Old Literary Finnish represent the first literary work ever written in Finnish
starting from the 16th century. There have been several projects in Finland that have …
starting from the 16th century. There have been several projects in Finland that have …
Help from the neighbors: Estonian dialect normalization using a Finnish dialect generator
While standard Estonian is not a low-resourced language, the different dialects of the
language are under-resourced from the point of view of NLP, given that there are no vast …
language are under-resourced from the point of view of NLP, given that there are no vast …
Computational exploration of the origin of mood in literary texts
E Öhman, RH Rossi - … of the 2nd International Workshop on …, 2022 - aclanthology.org
This paper is a methodological exploration of the origin of mood in early modern and
modern Finnish literary texts using computational methods. We discuss the pre-processing …
modern Finnish literary texts using computational methods. We discuss the pre-processing …
From plenipotentiary to puddingless: Users and uses of new words in early English letters
We study neologism use in two samples of early English correspondence, from 1640--1660
and 1760--1780. Of especial interest are the early adopters of new vocabulary, the social …
and 1760--1780. Of especial interest are the early adopters of new vocabulary, the social …