Named entity recognition in the Romanian legal domain
Recognition of named entities present in text is an important step towards information
extraction and natural language understanding. This work presents a named entity …
extraction and natural language understanding. This work presents a named entity …
Abstractive text summarization for Hungarian
In our research we have created a text summarization software tool for Hungarian using
multilingual and Hungarian BERT-based models. Two types of text summarization method …
multilingual and Hungarian BERT-based models. Two types of text summarization method …
Introducing the CURLICAT corpora: seven-language domain specific annotated corpora from curated sources
This article presents the current outcomes of the CURLICAT CEF Telecom project, which
aims to collect and deeply annotate a set of large corpora from selected domains. The …
aims to collect and deeply annotate a set of large corpora from selected domains. The …
UlyssesNER-Br: a corpus of Brazilian legislative documents for named entity recognition
The amount of legislative documents produced within the past decade has risen
dramatically, making it difficult for law practitioners to consult and update legislation. Named …
dramatically, making it difficult for law practitioners to consult and update legislation. Named …
Ulysses Tesemõ: a new large corpus for Brazilian legal and governmental domain
The increasing use of artificial intelligence methods in the legal field has sparked interest in
applying Natural Language Processing techniques to handle legal tasks and reduce the …
applying Natural Language Processing techniques to handle legal tasks and reduce the …
FuLG: 150B Romanian Corpus for Language Model Pretraining
Research in the field of language models is rapidly evolving, with many open models being
released to the public. Openly available pretraining corpora usually focus on only a handful …
released to the public. Openly available pretraining corpora usually focus on only a handful …
Histnero: Historical named entity recognition for the romanian language
This work introduces HistNERo, the first Romanian corpus for Named Entity Recognition
(NER) in historical newspapers. The dataset contains 323k tokens of text, covering more …
(NER) in historical newspapers. The dataset contains 323k tokens of text, covering more …
Annotators-in-the-loop: testing a novel annotation procedure on Italian case law
The availability of annotated legal corpora is crucial for a number of tasks, such as legal
search, legal information retrieval, and predictive justice. Annotation is mostly assumed to be …
search, legal information retrieval, and predictive justice. Annotation is mostly assumed to be …
Automatic Extraction of the Romanian Academic Word List: Data and Methods
This paper presents the methodology and data used for the automatic extraction of the
Romanian Academic Word List (Ro-AWL). Academic Word Lists are useful in both L2 and L1 …
Romanian Academic Word List (Ro-AWL). Academic Word Lists are useful in both L2 and L1 …
[PDF][PDF] In-depth evaluation of Romanian natural language processing pipelines
With the increased size of Universal Dependencies tree banks, several basic language
processing kits (BLARK) for multiple languages appeared in recent years, indicating …
processing kits (BLARK) for multiple languages appeared in recent years, indicating …