Survey of post-OCR processing approaches

TTH Nguyen, A Jatowt, M Coustaty… - ACM Computing Surveys …, 2021 - dl.acm.org
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …

Survey of automatic spelling correction

D Hládek, J Staš, M Pleva - Electronics, 2020 - mdpi.com
Automatic spelling correction has been receiving sustained research attention. Although
each article contains a brief introduction to the topic, there is a lack of work that would …

Parallel iterative edit models for local sequence transduction

A Awasthi, S Sarawagi, R Goyal, S Ghosh… - arxiv preprint arxiv …, 2019 - arxiv.org
We present a Parallel Iterative Edit (PIE) model for the problem of local sequence
transduction arising in tasks like Grammatical error correction (GEC). Recent approaches …

Optical character recognition with neural networks and post-correction with finite state methods

S Drobac, K Lindén - International Journal on Document Analysis and …, 2020 - Springer
The optical character recognition (OCR) quality of the historical part of the Finnish
newspaper and journal corpus is rather low for reliable search and scientific research on the …

Supervised OCR error detection and correction using statistical and neural machine translation methods

C Amrhein, S Clematide - Journal for Language Technology and …, 2018 - zora.uzh.ch
For indexing the content of digitized historical texts, optical character recognition (OCR)
errors are a hampering problem. To explore the effectivity of new strategies for OCR post …

Neural OCR post-hoc correction of historical corpora

L Lyu, M Koutraki, M Krickl, B Fetahu - Transactions of the Association …, 2021 - direct.mit.edu
Optical character recognition (OCR) is crucial for a deeper access to historical collections.
OCR needs to account for orthographic variations, typefaces, or language evolution (ie, new …

Social media text normalization for Turkish

G ERYİǦİT… - Natural Language …, 2017 - cambridge.org
Text normalization is an indispensable stage in processing noncanonical language from
natural sources, such as speech, social media or short text messages. Research in this field …

Old content and modern tools-searching named entities in a Finnish OCRed historical newspaper collection 1771-1910

K Kettunen, E Mäkelä, T Ruokolainen… - arxiv preprint arxiv …, 2016 - arxiv.org
Named Entity Recognition (NER), search, classification and tagging of names and name like
frequent informational elements in texts, has become a standard information extraction …

OCR and post-correction of historical Finnish texts

S Drobac, PS Kauppinen… - Nordic Conference of …, 2017 - researchportal.helsinki.fi
This paper presents experiments on Optical character recognition (OCR) as a combination
of Ocropy software and data-driven spelling correction that uses Weighted Finite-State …

Local string transduction as sequence labeling

J Ribeiro, S Narayan, S Cohen… - 27th International …, 2018 - research.ed.ac.uk
We show that the general problem of string transduction can be reduced to the problem of
sequence labeling. While character deletions and insertions are allowed in string …