[PDF][PDF] Risamálheild: A very large Icelandic text corpus
We present Risamálheild, the Icelandic Gigaword Corpus (IGC), a corpus containing more
than one billion running words from mostly contemporary texts. The work was carried out …
than one billion running words from mostly contemporary texts. The work was carried out …
A Warm Start and a Clean Crawled Corpus--A Recipe for Good Language Models
V Snæbjarnarson, HB Símonarson… - ar** a PoS-tagged corpus using existing tools
In this paper, we describe the development of a new tagged corpus of Icelandic, consisting
of about 1 million tokens. The goal is to use the corpus, among other things, as a new gold …
of about 1 million tokens. The goal is to use the corpus, among other things, as a new gold …
A Universal Dependencies conversion pipeline for a Penn-format constituency treebank
The topic of this paper is a rule-based pipeline for converting constituency treebanks based
on the Penn Treebank format to Universal Dependencies (UD). We describe an Icelandic …
on the Penn Treebank format to Universal Dependencies (UD). We describe an Icelandic …