LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
Recent advances in document image analysis (DIA) have been primarily driven by the
application of neural networks. Ideally, research outcomes could be easily deployed in …
application of neural networks. Ideally, research outcomes could be easily deployed in …
A survey of historical document image datasets
This paper presents a systematic literature review of image datasets for document image
analysis, focusing on historical documents, such as handwritten manuscripts and early …
analysis, focusing on historical documents, such as handwritten manuscripts and early …
DocSegTr: an instance-level end-to-end document image segmentation transformer
Understanding documents with rich layouts is an essential step towards information
extraction. Business intelligence processes often require the extraction of useful semantic …
extraction. Business intelligence processes often require the extraction of useful semantic …
Swindocsegmenter: An end-to-end unified domain adaptive transformer for document instance segmentation
Instance-level segmentation of documents consists in assigning a class-aware and instance-
aware label to each pixel of the image. It is a key step in document parsing for their …
aware label to each pixel of the image. It is a key step in document parsing for their …
Beyond document object detection: instance-level segmentation of complex layouts
Abstract Information extraction is a fundamental task of many business intelligence services
that entail massive document processing. Understanding a document page structure in …
that entail massive document processing. Understanding a document page structure in …
Enhancing optical character recognition: Efficient techniques for document layout analysis and text line detection
In recent years, automatic document and text analysis has gained significant importance,
driven by advancements in optical character recognition (OCR) technology and the need for …
driven by advancements in optical character recognition (OCR) technology and the need for …
M5HisDoc: A Large-scale Multi-style Chinese Historical Document Analysis Benchmark
Recognizing and organizing text in correct reading order plays a crucial role in historical
document analysis and preservation. While existing methods have shown promising …
document analysis and preservation. While existing methods have shown promising …
Digital Peter: New dataset, competition and handwriting recognition methods
This paper presents a new dataset of Peter the Great's manuscripts and describes a
segmentation procedure that converts initial images of documents into lines. This new …
segmentation procedure that converts initial images of documents into lines. This new …
Efficient ocr for building a diverse digital history
Many users consult digital archives daily, but the information they can access is
unrepresentative of the diversity of documentary history. The sequence-to-sequence …
unrepresentative of the diversity of documentary history. The sequence-to-sequence …
Parsing electronic theses and dissertations using object detection
Electronic theses and dissertations (ETDs) contain valuable knowledge that can be useful
for a wide range of purposes. To effectively utilize the knowledge contained in ETDs for …
for a wide range of purposes. To effectively utilize the knowledge contained in ETDs for …