Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions

C Ramisch, S Cordeiro, A Savary, V Vincze… - Proceedings of the …, 2018 - unora.unior.it
This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal
multiword expressions. We present the annotation methodology, focusing on changes from …

Evaluation of transfer learning for polish with a text-to-text model

A Chrabrowa, Ł Dragan, K Grzegorczyk… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce a new benchmark for assessing the quality of text-to-text models for Polish.
The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to …

A survey of large language models for european languages

W Ali, S Pyysalo - arxiv preprint arxiv:2408.15040, 2024 - arxiv.org
Large Language Models (LLMs) have gained significant attention due to their high
performance on a wide range of natural language tasks since the release of ChatGPT. The …

[PDF][PDF] The reference corpus of the contemporary Romanian language (CoRoLa)

VB Mititelu, D Tufiş, E Irimia - Proceedings of the Eleventh …, 2018 - aclanthology.org
We present here the largest publicly available corpus of Romanian. Its written component
contains 1,257,752,812 tokens, distributed, in an unbalanced way, in several language …

A deep learning model of spatial distance and named entity recognition (SD-NER) for flood mark text classification

R Szczepanek - Water, 2023 - mdpi.com
Information on historical flood levels can be communicated verbally, in documents, or in the
form of flood marks. The latter are the most useful from the point of view of public awareness …

The design of semi-lexicality

H Klockmann - Evidence from Case and Agreement in the …, 2017 - lotpublications.nl
The Design of Semi-lexicality Page 1 220 460 Heidi Klockmann The Design of Semi-lexicality
Semi-lexicality refers to lexical items which show both lexical and functional properties …

Latvian national corpora collection–korpuss. lv

B Saulīte, R Darģis, N Gruzitis, I Auziņa… - Proceedings of the …, 2022 - aclanthology.org
LNCC is a diverse collection of Latvian language corpora representing both written and
spoken language and is useful for both linguistic research and language modelling. The …

Arguments and adjuncts in Universal Dependencies

A Przepiórkowski, A Patejuk - Proceedings of the 27th …, 2018 - aclanthology.org
The aim of this paper is to argue for a coherent Universal Dependencies approach to the
core vs. non-core distinction. We demonstrate inconsistencies in the current version 2 of UD …

[HTML][HTML] A rule-based grapheme-to-phoneme conversion system

P Kłosowski - Applied Sciences, 2022 - mdpi.com
This article presents a rule-based grapheme-to-phoneme conversion method and algorithm
for Polish. It should be noted that the fundamental grapheme-to-phoneme conversion rules …

Sociolinguistics in East Central Europe

M Kontra, M Sloboda, J Nekvapil… - … Around the World, 2023 - taylorfrancis.com
Four countries in East Central Europe are discussed. For Hungary, Kontra describes urban
dialectology, the contact varieties of Hungarian, and language policy and rights in Hungary …