[PDF][PDF] MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora.
The paper presents the third edition of the MULTEXT-East language resources, a
multilingual dataset for language engineering research and development. This standardised …
multilingual dataset for language engineering research and development. This standardised …
Morphosyntactic corpora and tools for Persian
M Seraji - 2015 - diva-portal.org
Abstract Seraji, M. 2015. Morphosyntactic Corpora and Tools for Persian. Studia Linguistica
Upsaliensia 16. 191 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9229-8 …
Upsaliensia 16. 191 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9229-8 …
Finite-state computational morphology: An analyzer prototype for Zulu
L Pretorius, SE Bosch - Machine Translation, 2003 - Springer
As one of the largest of the 11 official languages of South Africa, Zulu is spoken by
approximately 9 million people. It forms part of a language family which is characterized by …
approximately 9 million people. It forms part of a language family which is characterized by …
A basic language resource kit for Persian
Abstract Persian with its about 100,000,000 speakers in the world belongs to the group of
languages with less developed linguistically annotated resources and tools. The few …
languages with less developed linguistically annotated resources and tools. The few …
[ספר][B] Error detection and correction in annotated corpora
M Dickinson - 2005 - search.proquest.com
Building on work showing the harmfulness of annotation errors for both the training and
evaluation of natural language processing technologies, this thesis develops a method for …
evaluation of natural language processing technologies, this thesis develops a method for …
Bilingual lexical extraction based on word alignment for improving corpus search
Purpose This paper aims to describe the structure of an aligned Serbian-German literary
corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to …
corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to …
MULTEXT-East resources for Serbian
MULTEXT-East is a multilingual dataset for language engineering research and
development. This standardised and linked set of resources covers a large number of mainly …
development. This standardised and linked set of resources covers a large number of mainly …
TermeX: A Tool for Collocation Extraction
Collocations–word combinations occurring together more often than by chance–have a wide
range of NLP applications. Many approaches for automating collocation extraction based on …
range of NLP applications. Many approaches for automating collocation extraction based on …
[PDF][PDF] Croatian lemmatization server
The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for
the whole range of procedures—from textsearch, up to parsing. From two predominant …
the whole range of procedures—from textsearch, up to parsing. From two predominant …
Neural spell-checker: Beyond words with synthetic data generation
Spell-checkers are valuable tools that enhance communication by identifying misspelled
words in written texts. Recent improvements in deep learning, and in particular in large …
words in written texts. Recent improvements in deep learning, and in particular in large …