Survey of post-OCR processing approaches
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …
converting printed documents into machine-readable ones. While OCR engines can do well …
Survey of automatic spelling correction
Automatic spelling correction has been receiving sustained research attention. Although
each article contains a brief introduction to the topic, there is a lack of work that would …
each article contains a brief introduction to the topic, there is a lack of work that would …
Parallel iterative edit models for local sequence transduction
We present a Parallel Iterative Edit (PIE) model for the problem of local sequence
transduction arising in tasks like Grammatical error correction (GEC). Recent approaches …
transduction arising in tasks like Grammatical error correction (GEC). Recent approaches …
Optical character recognition with neural networks and post-correction with finite state methods
The optical character recognition (OCR) quality of the historical part of the Finnish
newspaper and journal corpus is rather low for reliable search and scientific research on the …
newspaper and journal corpus is rather low for reliable search and scientific research on the …
Supervised OCR error detection and correction using statistical and neural machine translation methods
For indexing the content of digitized historical texts, optical character recognition (OCR)
errors are a hampering problem. To explore the effectivity of new strategies for OCR post …
errors are a hampering problem. To explore the effectivity of new strategies for OCR post …
Neural OCR post-hoc correction of historical corpora
Optical character recognition (OCR) is crucial for a deeper access to historical collections.
OCR needs to account for orthographic variations, typefaces, or language evolution (ie, new …
OCR needs to account for orthographic variations, typefaces, or language evolution (ie, new …
Social media text normalization for Turkish
G ERYİǦİT… - Natural Language …, 2017 - cambridge.org
Text normalization is an indispensable stage in processing noncanonical language from
natural sources, such as speech, social media or short text messages. Research in this field …
natural sources, such as speech, social media or short text messages. Research in this field …
Old content and modern tools-searching named entities in a Finnish OCRed historical newspaper collection 1771-1910
Named Entity Recognition (NER), search, classification and tagging of names and name like
frequent informational elements in texts, has become a standard information extraction …
frequent informational elements in texts, has become a standard information extraction …
OCR and post-correction of historical Finnish texts
S Drobac, PS Kauppinen… - Nordic Conference of …, 2017 - researchportal.helsinki.fi
This paper presents experiments on Optical character recognition (OCR) as a combination
of Ocropy software and data-driven spelling correction that uses Weighted Finite-State …
of Ocropy software and data-driven spelling correction that uses Weighted Finite-State …
Local string transduction as sequence labeling
We show that the general problem of string transduction can be reduced to the problem of
sequence labeling. While character deletions and insertions are allowed in string …
sequence labeling. While character deletions and insertions are allowed in string …