Named entity recognition in the Romanian legal domain

V Păiș, M Mitrofan, CL Gasan… - Proceedings of the …, 2021 - aclanthology.org
Recognition of named entities present in text is an important step towards information
extraction and natural language understanding. This work presents a named entity …

Abstractive text summarization for Hungarian

ZG Yang, Á Agócs, G Kusper, T Váradi - Annales Mathematicae et …, 2021 - real.mtak.hu
In our research we have created a text summarization software tool for Hungarian using
multilingual and Hungarian BERT-based models. Two types of text summarization method …

Introducing the CURLICAT corpora: seven-language domain specific annotated corpora from curated sources

T Váradi, B Nyéki, S Koeva, M Tadić… - Proceedings of the …, 2022 - aclanthology.org
This article presents the current outcomes of the CURLICAT CEF Telecom project, which
aims to collect and deeply annotate a set of large corpora from selected domains. The …

UlyssesNER-Br: a corpus of Brazilian legislative documents for named entity recognition

HO Albuquerque, R Costa, G Silvestre, E Souza… - … Processing of the …, 2022 - Springer
The amount of legislative documents produced within the past decade has risen
dramatically, making it difficult for law practitioners to consult and update legislation. Named …

Ulysses Tesemõ: a new large corpus for Brazilian legal and governmental domain

FA Siqueira, D Vitório, E Souza, JAP Santos… - Language Resources …, 2024 - Springer
The increasing use of artificial intelligence methods in the legal field has sparked interest in
applying Natural Language Processing techniques to handle legal tasks and reduce the …

FuLG: 150B Romanian Corpus for Language Model Pretraining

VA Bădoiu, MV Dumitru, AM Gherghescu… - arxiv preprint arxiv …, 2024 - arxiv.org
Research in the field of language models is rapidly evolving, with many open models being
released to the public. Openly available pretraining corpora usually focus on only a handful …

Histnero: Historical named entity recognition for the romanian language

AM Avram, A Iuga, GV Manolache, VC Matei… - … on Document Analysis …, 2024 - Springer
This work introduces HistNERo, the first Romanian corpus for Named Entity Recognition
(NER) in historical newspapers. The dataset contains 323k tokens of text, covering more …

Annotators-in-the-loop: testing a novel annotation procedure on Italian case law

E Zanoli, M Barbini, D Riva, S Picascia… - Proceedings of the 17th …, 2023 - air.unimi.it
The availability of annotated legal corpora is crucial for a number of tasks, such as legal
search, legal information retrieval, and predictive justice. Annotation is mostly assumed to be …

Automatic Extraction of the Romanian Academic Word List: Data and Methods

AM Bucur, A Dincă, M Chitez, R Rogobete - arxiv preprint arxiv …, 2023 - arxiv.org
This paper presents the methodology and data used for the automatic extraction of the
Romanian Academic Word List (Ro-AWL). Academic Word Lists are useful in both L2 and L1 …

[PDF][PDF] In-depth evaluation of Romanian natural language processing pipelines

V Pais, R Ion, AM Avram, M Mitrofan, D Tufis - Romanian Journal of …, 2021 - romjist.ro
With the increased size of Universal Dependencies tree banks, several basic language
processing kits (BLARK) for multiple languages appeared in recent years, indicating …