Transformer for handwritten text recognition using bidirectional post-decoding
Most recently, Transformers–which are recurrent-free neural network architectures–
achieved tremendous performances on various Natural Language Processing (NLP) tasks …
achieved tremendous performances on various Natural Language Processing (NLP) tasks …
Natural language processing for cultural heritage domains
C Sporleder - Language and Linguistics Compass, 2010 - Wiley Online Library
Museums, archives, libraries and other cultural heritage institutes maintain large collections
of artefacts, which are valuable knowledge sources for both experts and interested lay …
of artefacts, which are valuable knowledge sources for both experts and interested lay …
A survey of text alignment visualization
Text alignment is one of the fundamental techniques text-related domains like natural
language processing, computational linguistics, and digital humanities. It compares two or …
language processing, computational linguistics, and digital humanities. It compares two or …
An OCR post-correction approach using deep learning for processing medical reports
S Karthikeyan, AGS de Herrera… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
According to a recent Deloitte study, the COVID-19 pandemic continues to place a huge
strain on the global health care sector. Covid-19 has also catalysed digital transformation …
strain on the global health care sector. Covid-19 has also catalysed digital transformation …
OCR post-correction for detecting adversarial text images
The amount of images with embedded text shared on Online Social Networks (OSNs), such
as Twitter or Facebook has been growing in recent years. It is becoming important to …
as Twitter or Facebook has been growing in recent years. It is becoming important to …
Optical character recognition of 19th century classical commentaries: the current state of affairs
M Romanello, S Najem-Meyer… - Proceedings of the 6th …, 2021 - dl.acm.org
Together with critical editions and translations, commentaries are one of the main genres of
publication in literary and textual scholarship, and have a century-long tradition. Yet, the …
publication in literary and textual scholarship, and have a century-long tradition. Yet, the …
Multi-input attention for unsupervised OCR correction
We propose a novel approach to OCR post-correction that exploits repeated texts in large
corpora both as a source of noisy target outputs for unsupervised training and as a source of …
corpora both as a source of noisy target outputs for unsupervised training and as a source of …
A fast alignment scheme for automatic ocr evaluation of books
This paper aims to evaluate the accuracy of optical character recognition (OCR) systems on
real scanned books. The ground truth e-texts are obtained from the Project Gutenberg …
real scanned books. The ground truth e-texts are obtained from the Project Gutenberg …
PNRank: Unsupervised ranking of person name entities from noisy OCR text
Text databases have grown tremendously in number, size, and volume over the last few
decades. Optical Character Recognition (OCR) software is used to scan the text and make …
decades. Optical Character Recognition (OCR) software is used to scan the text and make …
Improving OCR accuracy on early printed books by utilizing cross fold training and voting
In this paper we introduce a method that significantly reduces the character error rates for
OCR text obtained from OCRopus models trained on early printed books. The method uses …
OCR text obtained from OCRopus models trained on early printed books. The method uses …