[PDF][PDF] MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora.

T Erjavec - LREC, 2004‏ - Citeseer
The paper presents the third edition of the MULTEXT-East language resources, a
multilingual dataset for language engineering research and development. This standardised …

Morphosyntactic corpora and tools for Persian

M Seraji - 2015‏ - diva-portal.org
Abstract Seraji, M. 2015. Morphosyntactic Corpora and Tools for Persian. Studia Linguistica
Upsaliensia 16. 191 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9229-8 …

Finite-state computational morphology: An analyzer prototype for Zulu

L Pretorius, SE Bosch - Machine Translation, 2003‏ - Springer
As one of the largest of the 11 official languages of South Africa, Zulu is spoken by
approximately 9 million people. It forms part of a language family which is characterized by …

A basic language resource kit for Persian

M Seraji, B Megyesi, J Nivre - … and Evaluation (LREC 2012), 23-25 …, 2012‏ - diva-portal.org
Abstract Persian with its about 100,000,000 speakers in the world belongs to the group of
languages with less developed linguistically annotated resources and tools. The few …

[ספר][B] Error detection and correction in annotated corpora

M Dickinson - 2005‏ - search.proquest.com
Building on work showing the harmfulness of annotation errors for both the training and
evaluation of natural language processing technologies, this thesis develops a method for …

Bilingual lexical extraction based on word alignment for improving corpus search

J Andonovski, B Šandrih, O Kitanović - The Electronic Library, 2019‏ - emerald.com
Purpose This paper aims to describe the structure of an aligned Serbian-German literary
corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to …

MULTEXT-East resources for Serbian

C Krstev, D Vitas, T Erjavec - … druzba IS 2004 Jezikovne tehnologije 9 …, 2004‏ - hal.science
MULTEXT-East is a multilingual dataset for language engineering research and
development. This standardised and linked set of resources covers a large number of mainly …

TermeX: A Tool for Collocation Extraction

D Delač, Z Krleža, J Šnajder, B Dalbelo Bašić… - … and Intelligent Text …, 2009‏ - Springer
Collocations–word combinations occurring together more often than by chance–have a wide
range of NLP applications. Many approaches for automating collocation extraction based on …

[PDF][PDF] Croatian lemmatization server

M Tadić - Formal Approaches to south Slavic and Balkan …, 2006‏ - Citeseer
The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for
the whole range of procedures—from textsearch, up to parsing. From two predominant …

Neural spell-checker: Beyond words with synthetic data generation

M Klemen, M Božič, ŠA Holdt… - … Conference on Text …, 2024‏ - Springer
Spell-checkers are valuable tools that enhance communication by identifying misspelled
words in written texts. Recent improvements in deep learning, and in particular in large …