Digitizing history: transitioning historical paper documents to digital content for information retrieval and mining—a comprehensive survey

N Girdhar, M Coustaty, A Doucet - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Historical document processing (HDP) corresponds to the task of converting the physical-
bind form of historical archives into a web-based centrally digitized form for their …

A survey of historical document image datasets

K Nikolaidou, M Seuret, H Mokayed… - International Journal on …, 2022 - Springer
This paper presents a systematic literature review of image datasets for document image
analysis, focusing on historical documents, such as handwritten manuscripts and early …

[HTML][HTML] Deep learning for historical document analysis and recognition—a survey

F Lombardi, S Marinai - Journal of Imaging, 2020 - mdpi.com
Nowadays, deep learning methods are employed in a broad range of research fields. The
analysis and recognition of historical documents, as we survey in this work, is not an …

M5HisDoc: A large-scale multi-style Chinese historical document analysis benchmark

Y Shi, C Liu, D Peng, C Jian… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recognizing and organizing text in correct reading order plays a crucial role in historical
document analysis and preservation. While existing methods have shown promising …

Reproducing the Past: A Dataset for Benchmarking Inscription Restoration

S Zhu, H Xue, N Nie, C Zhu, H Liu, P Fang - Proceedings of the 32nd …, 2024 - dl.acm.org
Inscriptions on ancient steles, as carriers of culture, encapsulate the humanistic thoughts
and aesthetic values of our ancestors. However, these relics often deteriorate due to …

Labeling, cutting, grou**: an efficient text line segmentation method for medieval manuscripts

M Alberti, L Vögtlin, V Pondenkandath… - 2019 International …, 2019 - ieeexplore.ieee.org
This paper introduces a new way for text-line extraction by integrating deep-learning based
pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex …

Intelligent automation of invoice parsing using computer vision techniques

A Chazhoor, VR Sarobin - Multimedia Tools and Applications, 2022 - Springer
Manual parsing of invoices is a tedious, arduous and error-prone task. Due to the academic
and business importance of this problem, it has attracted the attention of machine learning …

SCUT-CAB: a new benchmark dataset of ancient Chinese books with complex layouts for document layout analysis

H Cheng, C Jian, S Wu, L ** - International Conference on Frontiers in …, 2022 - Springer
Ancient books are the cultural heritage of human civilization, among which there are quite a
few precious collections in China. However, compared to modern documents, the absence …

U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts

S Zottin, A De Nardin, E Colombi, C Piciarelli… - Neural Computing and …, 2024 - Springer
Abstract Document Layout Analysis, which is the task of identifying different semantic
regions inside of a document page, is a subject of great interest for both computer scientists …

Low-shot transfer with attention for highly imbalanced cursive character recognition

A Jalali, S Kavuri, M Lee - Neural Networks, 2021 - Elsevier
Abstract Recognition of ancient Korean–Chinese cursive character (Hanja) is a challenging
problem mainly because of large number of classes, damaged cursive characters, various …