Survey of post-OCR processing approaches
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …
converting printed documents into machine-readable ones. While OCR engines can do well …
An OCR post-correction approach using deep learning for processing medical reports
S Karthikeyan, AGS de Herrera… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
According to a recent Deloitte study, the COVID-19 pandemic continues to place a huge
strain on the global health care sector. Covid-19 has also catalysed digital transformation …
strain on the global health care sector. Covid-19 has also catalysed digital transformation …
Advancing post-OCR correction: A comparative study of synthetic data
This paper explores the application of synthetic data in the post-OCR domain on multiple
fronts by conducting experiments to assess the impact of data volume, augmentation, and …
fronts by conducting experiments to assess the impact of data volume, augmentation, and …
[PDF][PDF] A hybrid solution for extracting information from unstructured data using optical character recognition (OCR) with natural language processing (NLP)
B Dash - Research Gate, 2021 - researchgate.net
With rapid digitalization, organizations are producing a lot of data as part of their day-to-day
operations. These data are stored either on their legacy platforms or in any cloud storage …
operations. These data are stored either on their legacy platforms or in any cloud storage …
Correcting arabic soft spelling mistakes using bilstm-based machine learning
Soft spelling errors are a class of spelling mistakes that is widespread among native Arabic
speakers and foreign learners alike. Some of these errors are typographical in nature. They …
speakers and foreign learners alike. Some of these errors are typographical in nature. They …
Post-OCR Text Correction for Bulgarian Historical Documents
The digitization of historical documents is crucial for preserving the cultural heritage of the
society. An important step in this process is converting scanned images to text using Optical …
society. An important step in this process is converting scanned images to text using Optical …
Toward a period-specific optimized neural network for OCR error correction of historical Hebrew texts
Over the past few decades, large archives of paper-based historical documents, such as
books and newspapers, have been digitized using the Optical Character Recognition (OCR) …
books and newspapers, have been digitized using the Optical Character Recognition (OCR) …
A concise survey of OCR for low-resource languages
Modern natural language processing (NLP) techniques increasingly require substantial
amounts of data to train robust algorithms. Building such technologies for low-resource …
amounts of data to train robust algorithms. Building such technologies for low-resource …
Synthetically Augmented Self-Supervised Fine-Tuning for Diverse Text OCR Correction
Abstract The adoption of Optical Character Recognition (OCR) tools has been central to the
increased digitization of historical documents. However, the errors introduced during OCR …
increased digitization of historical documents. However, the errors introduced during OCR …
Leveraging text repetitions and denoising autoencoders in OCR post-correction
A common approach for improving OCR quality is a post-processing step based on models
correcting misdetected characters and tokens. These models are typically trained on aligned …
correcting misdetected characters and tokens. These models are typically trained on aligned …