Introducing NYTK-NerKor, a gold standard Hungarian named entity annotated corpus

E Simon, N Vadász - Text, Speech, and Dialogue: 24th International …, 2021 - Springer
Here we present NYTK-NerKor, a gold standard Hungarian named entity annotated corpus
containing 1 million tokens. This is the largest corpus ever in its kind. It contains balanced …

Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines

G Orosz, G Szabó, P Berkecz, Z Szántó… - … Conference on Text …, 2023 - Springer
This paper presents a set of industrial-grade text processing models for Hungarian that
achieve near state-of-the-art performance while balancing resource efficiency and accuracy …

Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian

B Baykara, T Güngör - Language Resources and Evaluation, 2022 - Springer
Due to the exponential growth in the number of documents on the Web, accessing the
salient information relevant to a user need is gaining importance, which increases the …

[PDF][PDF] One format to rule them all–The emtsv pipeline for Hungarian

B Indig, B Sass, E Simon, I Mittelholcz, N Vadász… - 2019 - real.mtak.hu
We present a more efficient version of the e-magyar NLP pipeline for Hungarian called
emtsv. It integrates Hungarian NLP tools in a framework whose individual modules can be …

Introducing the CURLICAT corpora: seven-language domain specific annotated corpora from curated sources

T Váradi, B Nyéki, S Koeva, M Tadić… - Proceedings of the …, 2022 - aclanthology.org
This article presents the current outcomes of the CURLICAT CEF Telecom project, which
aims to collect and deeply annotate a set of large corpora from selected domains. The …

When MIPVU goes to No Man's Land: A new language resource for hybrid, morpheme-based metaphor identification in Hungarian

G Simon, T Bajzát, J Ballagó, Z Havasi… - Language Resources …, 2023 - Springer
The aim of the article is to present a new language resource for metaphor analysis in
corpora that is (i) a MIPVU-inspired, morpheme-based process for identifying metaphor in …

Elte poetry corpus: A machine annotated database of canonical hungarian poetry

P Horváth, P Kundráth, B Indig, Z Fellegi… - Proceedings of the …, 2022 - aclanthology.org
ELTE Poetry Corpus is a database that stores canonical Hungarian poetry with automatically
generated annotations of the poems' structural units, grammatical features and sound …

Identification and analysis of personification in Hungarian: The PerSECorp project

G Simon - Proceedings of the Thirteenth Language Resources …, 2022 - aclanthology.org
Despite the recent findings on the conceptual and linguistic organization of personification,
we have relatively little knowledge about its lexical patterns and grammatical templates. It is …

Determining Argument Structure Variants by Numerical Optimization

K Szécsényi, T Szécsényi - … Use and Linguistic Structure. Proceedings of …, 2024 - ceeol.com
The paper proposes a representation of arguments and adjuncts in terms of probability
value vectors, and presents a method to calculate argument structure solely based on …

[PDF][PDF] Epic Formulas and Intertextuality in 16th Century Hungarian Historical or Epic Songs

L Seláf, V Vigyikán, P Plecháč, M Kiss - 2024 - real.mtak.hu
The first great period of Hungarian literature is the 16th century. From earlier times only a
very limited number of texts, and even less poems, have been conserved: a real literary …