Aranea: Yet another family of (comparable) web corpora

V Benko - Text, Speech and Dialogue: 17th International …, 2014‏ - Springer
Our paper deals with an on-going Project in the framework of which, by means of open-
source and free tools, a family of web corpora is being created that would (to a large extend) …

Subtlex-pl: subtitle-based word frequency estimates for Polish

P Mandera, E Keuleers, Z Wodniecka… - Behavior research …, 2015‏ - Springer
Abstract We present SUBTLEX-PL, Polish word frequencies based on movie subtitles. In two
lexical decision experiments, we compare the new measures with frequency estimates …

Corpus-based vocabulary lists for language learners for nine languages

A Kilgarriff, F Charalabopoulou, M Gavrilidou… - Language resources …, 2014‏ - Springer
We present the KELLY project and its work on develo** monolingual and bilingual word
lists for language learning, using corpus methods, for nine languages and thirty-six …

MULTEXT-East: morphosyntactic resources for Central and Eastern European languages

T Erjavec - Language resources and evaluation, 2012‏ - Springer
The paper presents the MULTEXT-East language resources, a multilingual dataset for
language engineering research, focused on the morphosyntactic level of linguistic …

[PDF][PDF] Morfeusz reloaded

M Woliński - Proceedings of the ninth international conference on …, 2014‏ - Citeseer
The paper presents recent developments in Morfeusz–a morphological analyser for Polish.
The program, being already a fundamental resource for processing Polish, has been …

[PDF][PDF] National corpus of polish

A Przepiórkowski, M Bańko, RL Górski… - Proceedings of the 5th …, 2011‏ - academia.edu
The paper presents the main results of the National Corpus of Polish project, which took
place from December 2007 to June 2011, including: the sizes of the main corpus and …

[PDF][PDF] Compatible sketch grammars for comparable corpora

V Benko - Proceedings of the XVI EURALEX International …, 2014‏ - euralex.org
Our paper describes an on-going experiment aimed at creating a family of billion-token web
corpora that could to a large extent deserve the designation “comparable”: corpora are of the …

Part of speech tagging for Polish: State of the art and future perspectives

Ł Kobyliński, W Kieraś - … Conference on Intelligent Text Processing and …, 2016‏ - Springer
In this paper we discuss the intricacies of Polish language part of speech tagging, present
the current state of the art by comparing available taggers in detail and show the main …

Terminology extraction from medical texts in Polish

M Marciniak, A Mykowiecka - Journal of biomedical semantics, 2014‏ - Springer
Background Hospital documents contain free text describing the most important facts
relating to patients and their illnesses. These documents are written in specific language …

[PDF][PDF] Beyond the transfer-and-merge wordnet construction: plWordNet and a comparison with WordNet

M Maziarz, M Piasecki, E Rudnicka… - Proceedings of the …, 2013‏ - aclanthology.org
Wordnets are lexico-semantic resources essential in many NLP tasks. Princeton WordNet is
the most widely known, and the most influential, among them. Wordnets for languages other …