Aranea: Yet another family of (comparable) web corpora
Our paper deals with an on-going Project in the framework of which, by means of open-
source and free tools, a family of web corpora is being created that would (to a large extend) …
source and free tools, a family of web corpora is being created that would (to a large extend) …
Subtlex-pl: subtitle-based word frequency estimates for Polish
Abstract We present SUBTLEX-PL, Polish word frequencies based on movie subtitles. In two
lexical decision experiments, we compare the new measures with frequency estimates …
lexical decision experiments, we compare the new measures with frequency estimates …
Corpus-based vocabulary lists for language learners for nine languages
We present the KELLY project and its work on develo** monolingual and bilingual word
lists for language learning, using corpus methods, for nine languages and thirty-six …
lists for language learning, using corpus methods, for nine languages and thirty-six …
MULTEXT-East: morphosyntactic resources for Central and Eastern European languages
The paper presents the MULTEXT-East language resources, a multilingual dataset for
language engineering research, focused on the morphosyntactic level of linguistic …
language engineering research, focused on the morphosyntactic level of linguistic …
[PDF][PDF] Morfeusz reloaded
The paper presents recent developments in Morfeusz–a morphological analyser for Polish.
The program, being already a fundamental resource for processing Polish, has been …
The program, being already a fundamental resource for processing Polish, has been …
[PDF][PDF] National corpus of polish
The paper presents the main results of the National Corpus of Polish project, which took
place from December 2007 to June 2011, including: the sizes of the main corpus and …
place from December 2007 to June 2011, including: the sizes of the main corpus and …
[PDF][PDF] Compatible sketch grammars for comparable corpora
Our paper describes an on-going experiment aimed at creating a family of billion-token web
corpora that could to a large extent deserve the designation “comparable”: corpora are of the …
corpora that could to a large extent deserve the designation “comparable”: corpora are of the …
Part of speech tagging for Polish: State of the art and future perspectives
In this paper we discuss the intricacies of Polish language part of speech tagging, present
the current state of the art by comparing available taggers in detail and show the main …
the current state of the art by comparing available taggers in detail and show the main …
Terminology extraction from medical texts in Polish
Background Hospital documents contain free text describing the most important facts
relating to patients and their illnesses. These documents are written in specific language …
relating to patients and their illnesses. These documents are written in specific language …
[PDF][PDF] Beyond the transfer-and-merge wordnet construction: plWordNet and a comparison with WordNet
Wordnets are lexico-semantic resources essential in many NLP tasks. Princeton WordNet is
the most widely known, and the most influential, among them. Wordnets for languages other …
the most widely known, and the most influential, among them. Wordnets for languages other …