How well does multiple OCR error correction generalize?

WB Lund, EK Ringger… - Document Recognition and …, 2014 - spiedigitallibrary.org
As the digitization of historical documents, such as newspapers, becomes more common,
the need of the archive patron for accurate digital text from those documents increases …

The NoisyOffice database: a corpus to train supervised machine learning filters for image processing

MJ Castro-Bleda, S España-Boquera… - The Computer …, 2020 - academic.oup.com
This paper presents the 'NoisyOffice'database. It consists of images of printed text
documents with noise mainly caused by uncleanliness from a generic office, such as coffee …

[หนังสือ][B] Ensemble Methods for Historical Machine-Printed Document Recognition

WB Lund - 2014 - search.proquest.com
The usefulness of digitized documents is directly related to the quality of the extracted text.
Optical Character Recognition (OCR) has reached a point where well-formatted and clean …

Evaluating supervised topic models in the presence of OCR errors

D Walker, E Ringger, K Seppi - Document Recognition and …, 2013 - spiedigitallibrary.org
Supervised topic models are promising tools for text analytics that simultaneously model
topical patterns in document collections and relationships between those topics and …

[หนังสือ][B] Bayesian Test Analytics for Document Collections

DD Walker IV - 2012 - search.proquest.com
Modern document collections are too large to annotate and curate manually. As increasingly
large amounts of data become available, historians, librarians and other scholars …

Aligning transcript of historical documents using dynamic programming

I Rabaev, R Cohen, J El-Sana… - … and Retrieval XXII, 2015 - spiedigitallibrary.org
We present a simple and accurate approach for aligning historical documents with their
corresponding transcription. First, a representative of each letter in the historical document is …