[PDF][PDF] Risamálheild: A very large Icelandic text corpus

S Steingrímsson, S Helgadóttir… - Proceedings of the …, 2018 - aclanthology.org
We present Risamálheild, the Icelandic Gigaword Corpus (IGC), a corpus containing more
than one billion running words from mostly contemporary texts. The work was carried out …

A Warm Start and a Clean Crawled Corpus--A Recipe for Good Language Models

V Snæbjarnarson, HB Símonarson… - ar** a PoS-tagged corpus using existing tools
H Loftsson, JH Yngvason, S Helgadóttir… - … SaLTMiL Workshop on …, 2010 - academia.edu
In this paper, we describe the development of a new tagged corpus of Icelandic, consisting
of about 1 million tokens. The goal is to use the corpus, among other things, as a new gold …

A Universal Dependencies conversion pipeline for a Penn-format constituency treebank

Þ Arnardóttir, H Hafsteinsson… - Proceedings of the …, 2020 - aclanthology.org
The topic of this paper is a rule-based pipeline for converting constituency treebanks based
on the Penn Treebank format to Universal Dependencies (UD). We describe an Icelandic …