Introducing NYTK-NerKor, a gold standard Hungarian named entity annotated corpus
Here we present NYTK-NerKor, a gold standard Hungarian named entity annotated corpus
containing 1 million tokens. This is the largest corpus ever in its kind. It contains balanced …
containing 1 million tokens. This is the largest corpus ever in its kind. It contains balanced …
Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines
This paper presents a set of industrial-grade text processing models for Hungarian that
achieve near state-of-the-art performance while balancing resource efficiency and accuracy …
achieve near state-of-the-art performance while balancing resource efficiency and accuracy …
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian
Due to the exponential growth in the number of documents on the Web, accessing the
salient information relevant to a user need is gaining importance, which increases the …
salient information relevant to a user need is gaining importance, which increases the …
[PDF][PDF] One format to rule them all–The emtsv pipeline for Hungarian
We present a more efficient version of the e-magyar NLP pipeline for Hungarian called
emtsv. It integrates Hungarian NLP tools in a framework whose individual modules can be …
emtsv. It integrates Hungarian NLP tools in a framework whose individual modules can be …
Introducing the CURLICAT corpora: seven-language domain specific annotated corpora from curated sources
This article presents the current outcomes of the CURLICAT CEF Telecom project, which
aims to collect and deeply annotate a set of large corpora from selected domains. The …
aims to collect and deeply annotate a set of large corpora from selected domains. The …
When MIPVU goes to No Man's Land: A new language resource for hybrid, morpheme-based metaphor identification in Hungarian
The aim of the article is to present a new language resource for metaphor analysis in
corpora that is (i) a MIPVU-inspired, morpheme-based process for identifying metaphor in …
corpora that is (i) a MIPVU-inspired, morpheme-based process for identifying metaphor in …
Elte poetry corpus: A machine annotated database of canonical hungarian poetry
P Horváth, P Kundráth, B Indig, Z Fellegi… - Proceedings of the …, 2022 - aclanthology.org
ELTE Poetry Corpus is a database that stores canonical Hungarian poetry with automatically
generated annotations of the poems' structural units, grammatical features and sound …
generated annotations of the poems' structural units, grammatical features and sound …
Identification and analysis of personification in Hungarian: The PerSECorp project
G Simon - Proceedings of the Thirteenth Language Resources …, 2022 - aclanthology.org
Despite the recent findings on the conceptual and linguistic organization of personification,
we have relatively little knowledge about its lexical patterns and grammatical templates. It is …
we have relatively little knowledge about its lexical patterns and grammatical templates. It is …
Determining Argument Structure Variants by Numerical Optimization
The paper proposes a representation of arguments and adjuncts in terms of probability
value vectors, and presents a method to calculate argument structure solely based on …
value vectors, and presents a method to calculate argument structure solely based on …
[PDF][PDF] Epic Formulas and Intertextuality in 16th Century Hungarian Historical or Epic Songs
The first great period of Hungarian literature is the 16th century. From earlier times only a
very limited number of texts, and even less poems, have been conserved: a real literary …
very limited number of texts, and even less poems, have been conserved: a real literary …