[PDF][PDF] The German reference corpus DeReKo: New developments–new opportunities

M Kupietz, H Lüngen, P Kamocki… - Proceedings of the …, 2018 - aclanthology.org
This paper discusses current trends in DeReKo, the German Reference Corpus, concerning
legal issues around the recent German copyright reform with positive implications for corpus …

[PDF][PDF] Little strokes fell great oaks: Creating CoRoLa, the reference corpus of contemporary Romanian

D Tufiș, V Barbu Mititelu, E Irimia, V Păiș, R Ion… - 2019 - dspace.bcu-iasi.ro
The paper presents the quite long-standing tradition of Romanian corpus acquisition and
processing, which reaches its peak with the reference corpus of contemporary Romanian …

Coroseof-an annotated corpus of romanian sexist and offensive tweets

DC Hoefels, Ç Çöltekin… - Proceedings of the …, 2022 - aclanthology.org
This paper introduces CoRoSeOf, a large corpus of Romanian social media manually
annotated for sexist and offensive language. We describe the annotation process of the …

[PDF][PDF] In-depth evaluation of Romanian natural language processing pipelines

V Pais, R Ion, AM Avram, M Mitrofan, D Tufis - Romanian Journal of …, 2021 - romjist.ro
With the increased size of Universal Dependencies tree banks, several basic language
processing kits (BLARK) for multiple languages appeared in recent years, indicating …

Deep learning for automatic diacritics restoration in Romanian

M Nuţu, B Lőrincz, A Stan - 2019 IEEE 15th International …, 2019 - ieeexplore.ieee.org
In this paper we address the issue of automatic diacritics restoration (ADR) for Romanian
using deep learning strategies. We compare 6 separate architectures with various mixtures …

Automatic extraction of the romanian academic word list: Data and methods

AM Bucur, A Dincă, M Chitez, R Rogobete - ar** the most recent Balanced Corpus of
Modern Latvian (LVK2018) from various online sources. Most of the new corpora are …

Romanian part of speech tagging using lstm networks

B Lőrincz, M Nuţu, A Stan - 2019 IEEE 15th International …, 2019 - ieeexplore.ieee.org
In this paper we present LSTM based neural network architectures for determining the part
of speech (POS) tags for Romanian words. LSTM networks combined with fully-connected …