[PDF][PDF] Creating the european literary text collection (eltec): Challenges and perspectives

C Schöch, T Erjavec, R Patras… - Christof Schöch; Tomaz …, 2021 - comum.rcaap.pt
The aim of this contribution is to reflect on the process of building the multilingual European
Literary Text Collection (ELTeC) that is being created in the framework of the networking …

[PDF][PDF] Universal dependency annotation for multilingual parsing

R McDonald, J Nivre… - Proceedings of the …, 2013 - aclanthology.org
We present a new collection of treebanks with homogeneous syntactic dependency
annotation for six languages: German, English, Swedish, Spanish, French and Korean. To …

What does neural bring? analysing improvements in morphosyntactic annotation and lemmatisation of Slovenian, Croatian and Serbian

N Ljubešić, K Dobrovoljc - Proceedings of the 7th workshop on …, 2019 - aclanthology.org
We present experiments on Slovenian, Croatian and Serbian morphosyntactic annotation
and lemmatisation between the former state-of-the-art for these three languages and one of …

CLASSLA-Stanza: The next step for linguistic processing of South Slavic languages

L Terčon, N Ljubešić - arxiv preprint arxiv:2308.04255, 2023 - arxiv.org
We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of the South
Slavic languages, which is based on the Stanza natural language processing pipeline. We …

[PDF][PDF] The reference corpus of the contemporary Romanian language (CoRoLa)

VB Mititelu, D Tufiş, E Irimia - Proceedings of the Eleventh …, 2018 - aclanthology.org
We present here the largest publicly available corpus of Romanian. Its written component
contains 1,257,752,812 tokens, distributed, in an unbalanced way, in several language …

A tiered CRF tagger for Polish

A Radziszewski - Intelligent tools for building a scientific information …, 2013 - Springer
In this paper we present a new approach to morphosyntactic tagging of Polish by bringing
together Conditional Random Fields and tiered tagging. Our proposal also allows to take …

[PDF][PDF] Lemmatization and morphosyntactic tagging of Croatian and Serbian

Ž Agić, N Ljubešić, D Merkler - Proceedings of the 4th Biennial …, 2013 - aclanthology.org
We investigate state-of-the-art statistical models for lemmatization and morphosyntactic
tagging of Croatian and Serbian. The models stem from a new manually annotated …

The Janes project: language resources and tools for Slovene user generated content

D Fišer, N Ljubešić, T Erjavec - Language resources and evaluation, 2020 - Springer
The paper presents the results of the Janes project, which aimed to develop language
resources and tools for Slovene user generated content. The paper first describes the 200 …

[PDF][PDF] Little strokes fell great oaks: Creating CoRoLa, the reference corpus of contemporary Romanian

D Tufiș, V Barbu Mititelu, E Irimia, V Păiș, R Ion… - 2019 - dspace.bcu-iasi.ro
The paper presents the quite long-standing tradition of Romanian corpus acquisition and
processing, which reaches its peak with the reference corpus of contemporary Romanian …

[BOG][B] Multilayer corpus studies

A Zeldes - 2018 - taylorfrancis.com
This volume explores the opportunities afforded by the construction and evaluation of
multilayer corpora, an emerging methodology within corpus linguistics that brings about …