- Academic Search

LL de Oliveira, DS Vargas, AMA Alexandre… - International Journal on …, 2023 - Springer

Optical character recognition (OCR) is typically used to extract the textual contents of
scanned texts. The output of OCR can be noisy, especially when the quality of the scanned …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Advancing post-OCR correction: A comparative study of synthetic data

S Guan, D Greene - arxiv preprint arxiv:2408.02253, 2024 - arxiv.org

This paper explores the application of synthetic data in the post-OCR domain on multiple
fronts by conducting experiments to assess the impact of data volume, augmentation, and …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Leveraging open large language models for historical named entity recognition

CE González-Gallardo, HTH Tran, A Hamdi… - … Conference on Theory …, 2024 - Springer

The efficacy of large-scale language models (LLMs) as few-shot learners has dominated the
field of natural language processing, achieving state-of-the-art performance in most tasks …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Injecting temporal-aware knowledge in historical named entity recognition

CE González-Gallardo, E Boros, E Giamphy… - … on Information Retrieval, 2023 - Springer

In this paper, we address the detection of named entities in multilingual historical collections.
We argue that, besides the multiple challenges that depend on the quality of digitization (eg …

Speichern Zitieren Zitiert von: 9 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Archive timeline summarization (atls): conceptual framework for timeline generation over historical document collections

N Gutehrlé, A Doucet, A Jatowt - arxiv preprint arxiv:2301.13479, 2023 - arxiv.org

Archive collections are nowadays mostly available through search engines interfaces, which
allow a user to retrieve documents by issuing queries. The study of these collections may be …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Confidence-Aware Document OCR Error Detection

A Hemmer, M Coustaty, N Bartolo, JM Ogier - International Workshop on …, 2024 - Springer

Abstract Optical Character Recognition (OCR) continues to face accuracy challenges that
impact subsequent applications. To address these errors, we explore the utility of OCR …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The digitization of historical astrophysical literature with highly localized figures and figure captions

JP Naiman, PKG Williams, A Goodman - International Journal on Digital …, 2024 - Springer

Scientific articles published prior to the “age of digitization” in the late 1990s contain figures
which are “trapped” within their scanned pages. While progress to extract figures and their …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 3 Versionen

Exploring the capabilities of gpt4-vision as ocr engine

A Ghiriti, W Göderle, R Kern - … Conference on Theory and Practice of …, 2024 - Springer

Many museums and libraries conducted efforts to digitize their assets, and many historic
documents are now available as digital images. However, these documents are not directly …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach

D Fleischhacker, W Goederle, R Kern - arxiv preprint arxiv:2401.07787, 2024 - arxiv.org

This paper addresses a major challenge to historical research on the 19th century. Large
quantities of sources have become digitally available for the first time, while extraction …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large Synthetic Data from the arxiv for OCR Post Correction of Historic Scientific Articles

JP Naiman, MG Cosillo, PKG Williams… - arxiv preprint arxiv …, 2023 - arxiv.org

Scientific articles published prior to the" age of digitization"(~ 1997) require Optical
Character Recognition (OCR) to transform scanned documents into machine-readable text …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Assessing the impact of OCR noise on multilingual event detection over digitised documents

Evaluating and mitigating the impact of OCR errors on information retrieval

Advancing post-OCR correction: A comparative study of synthetic data

Leveraging open large language models for historical named entity recognition

Injecting temporal-aware knowledge in historical named entity recognition

Archive timeline summarization (atls): conceptual framework for timeline generation over historical document collections

Confidence-Aware Document OCR Error Detection

The digitization of historical astrophysical literature with highly localized figures and figure captions

Exploring the capabilities of gpt4-vision as ocr engine

Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach

Large Synthetic Data from the arxiv for OCR Post Correction of Historic Scientific Articles