Survey of post-OCR processing approaches

TTH Nguyen, A Jatowt, M Coustaty… - ACM Computing Surveys …, 2021 - dl.acm.org
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …

Survey of automatic spelling correction

D Hládek, J Staš, M Pleva - Electronics, 2020 - mdpi.com
Automatic spelling correction has been receiving sustained research attention. Although
each article contains a brief introduction to the topic, there is a lack of work that would …

Test collection based evaluation of information retrieval systems

M Sanderson - Foundations and Trends® in Information …, 2010 - nowpublishers.com
Use of test collections and evaluation measures to assess the effectiveness of information
retrieval systems has its origins in work dating back to the early 1950s. Across the nearly 60 …

[PDF][PDF] The TREC Spoken Document Retrieval Track: A Success Story.

JS Garofolo, CGP Auzanne… - NIST SPECIAL …, 2000 - tsapps.nist.gov
This paper describes work within the NIST Text REtrieval Conference (TREC) over the last
three years in designing and implementing evaluations of Spoken Document Retrieval …

Machine transliteration survey

S Karimi, F Scholer, A Turpin - ACM Computing Surveys (CSUR), 2011 - dl.acm.org
Machine transliteration is the process of automatically transforming the script of a word from
a source language to a target language, while preserving pronunciation. The development …

Streamlining Evaluation with ir-measures

S MacAvaney, C Macdonald, I Ounis - European Conference on …, 2022 - Springer
We present ir-measures, a new tool that makes it convenient to calculate a diverse set of
evaluation measures used in information retrieval. Rather than implementing its own …

Cross‐Evaluation: A new model for information system evaluation

Y Sun, PB Kantor - Journal of the American Society for …, 2006 - Wiley Online Library
In this article, we introduce a new information system evaluation method and report on its
application to a collaborative information seeking system, AntWorld. The key innovation of …

Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record

P Ruch, R Baud, A Geissbühler - Artificial intelligence in medicine, 2003 - Elsevier
In this article, we show how a set of natural language processing (NLP) tools can be
combined to improve the processing of clinical records. The study concentrates on …

Evaluating and mitigating the impact of OCR errors on information retrieval

LL de Oliveira, DS Vargas, AMA Alexandre… - International Journal on …, 2023 - Springer
Optical character recognition (OCR) is typically used to extract the textual contents of
scanned texts. The output of OCR can be noisy, especially when the quality of the scanned …

Assessing the impact of OCR errors in information retrieval

GT Bazzo, GA Lorentz, D Suarez Vargas… - Advances in Information …, 2020 - Springer
A significant amount of the textual content available on the Web is stored in PDF files. These
files are typically converted into plain text before they can be processed by information …