Tiered tagging and combined language models classifiers

D Tufiş - International Workshop on Text, Speech and Dialogue, 1999‏ - Springer
We address the problem of morpho-syntactic disambiguation of arbitrary texts in a highly
inflectional natural language. We use a large tagset (615 tags), EAGLES and MULTEXT …

Automatic diacritic restoration for resource-scarce languages

G De Pauw, PW Wagacha, GM De Schryver - Text, Speech and Dialogue …, 2007‏ - Springer
The orthography of many resource-scarce languages includes diacritically marked
characters. Falling outside the scope of the standard Latin encoding, these characters are …

Homograph disambiguation through selective diacritic restoration

S Alqahtani, H Aldarmaki, M Diab - arxiv preprint arxiv:1912.04479, 2019‏ - arxiv.org
Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly
prevalent for languages with diacritics that tend to be omitted in writing, such as Arabic …

Diacritics restoration: Learning from letters versus learning from words

RF Mihalcea - … Linguistics and Intelligent Text Processing: Third …, 2002‏ - Springer
This paper presents a method for diacritics restoration based on learning mechanisms that
act at letter level. This technique is new to our knowledge, and we compare it with the well …

Restoring tone-marks in standard Yorùbá electronic text: improved model

FO Asahiah, OA Odejobi, ER Adagunodo - Computer Science, 2017‏ - journals.agh.edu.pl
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts
that utilizes letters outside the basic Latin alphabet used by English language. Yorùbá is one …

Statistical unicodification of African languages

KP Scannell - Language resources and evaluation, 2011‏ - Springer
Many languages in Africa are written using Latin-based scripts, but often with extra diacritics
(eg dots below in Igbo:\di,\do,\du) or modifications to the letters themselves (eg open vowels …

Corpus-based diacritic restoration for south slavic languages

N Ljubešić, T Erjavec, D Fišer - Proceedings of the Tenth …, 2016‏ - aclanthology.org
In computer-mediated communication, Latin-based scripts users often omit diacritics when
writing. Such text is typically easily understandable to humans but very difficult for …

Diacritization as a machine translation and as a sequence labeling problem

T Schlippe, TL Nguyen, S Vogel - … of the 8th Conference of the …, 2008‏ - aclanthology.org
In this paper we describe and compare two techniques for the automatic diacritization of
Arabic text: First, we treat diacritization as a monotone machine translation problem …

[PDF][PDF] DIAC+: A professional diacritics recovering system

D Tufiş, A Ceauşu - Proceedings of LREC 2008, 2008‏ - researchgate.net
In languages that use diacritical characters, if these special signs are stripped-off from a
word, the resulted string of characters may not exist in the language, and therefore its …

Deep learning for automatic diacritics restoration in Romanian

M Nuţu, B Lőrincz, A Stan - 2019 IEEE 15th International …, 2019‏ - ieeexplore.ieee.org
In this paper we address the issue of automatic diacritics restoration (ADR) for Romanian
using deep learning strategies. We compare 6 separate architectures with various mixtures …