OCR4all—An open-source tool providing a (semi-) automatic OCR workflow for historical printings
C Reul, D Christ, A Hartelt, N Balbach, M Wehner… - Applied Sciences, 2019 - mdpi.com
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due
to the complexity of the layout and the highly variant typography. Nevertheless, in the last …
to the complexity of the layout and the highly variant typography. Nevertheless, in the last …
Towards realistic practices in low-resource natural language processing: The development set
Development sets are impractical to obtain for real low-resource languages, since using all
available data for training is often more effective. However, development sets are widely …
available data for training is often more effective. However, development sets are widely …
Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin
In this paper we describe a dataset of German and Latin\textit {ground truth}(GT) for
historical OCR in the form of printed text line images paired with their transcription. This …
historical OCR in the form of printed text line images paired with their transcription. This …
[ΒΙΒΛΙΟ][B] Multilayer corpus studies
A Zeldes - 2018 - taylorfrancis.com
This volume explores the opportunities afforded by the construction and evaluation of
multilayer corpora, an emerging methodology within corpus linguistics that brings about …
multilayer corpora, an emerging methodology within corpus linguistics that brings about …
[PDF][PDF] Normalization of historical texts with neural network models
M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …
effective natural language processing (NLP) for these documents is on the rise. However …
Corpus annotation
J Newman, C Cox - A practical handbook of corpus linguistics, 2021 - Springer
In this chapter, we provide an overview of the main concepts relating to corpus annotation,
along with some discussion of the practical aspects of creating annotated texts and working …
along with some discussion of the practical aspects of creating annotated texts and working …
Summarising historical text in modern languages
We introduce the task of historical text summarisation, where documents in historical forms
of a language are summarised in the corresponding modern language. This is a …
of a language are summarised in the corresponding modern language. This is a …
ANNIS: A graph-based query system for deeply annotated text corpora
T Krause - 2019 - edoc.hu-berlin.de
Diese Dissertation beschreibt das Design und die Implementierung eines effizienten
Suchsystems für linguistische Korpora. Das bestehende und auf einer relationalen …
Suchsystems für linguistische Korpora. Das bestehende und auf einer relationalen …
Multi-task learning for historical text normalization: Size matters
Historical text normalization suffers from small datasets that exhibit high variance, and
previous work has shown that multi-task learning can be used to leverage data from related …
previous work has shown that multi-task learning can be used to leverage data from related …
Abschnittsweise Analyse sprachlicher Flüssigkeit in der Lernersprache: Das Ganze ist weniger informativ als seine Teile
M Belz, C Odebrecht - Zeitschrift für germanistische Linguistik, 2022 - degruyter.com
In this corpus-based study we explore three measurements of L2 fluency–articulation rate,
filler particles, and pauses–, both within and between two registers of spontaneous …
filler particles, and pauses–, both within and between two registers of spontaneous …