A comprehensive survey of mostly textual document segmentation algorithms since 2008

S Eskenazi, P Gomez-Krämer, JM Ogier - Pattern recognition, 2017 - Elsevier
In document image analysis, segmentation is the task that identifies the regions of a
document. The increasing number of applications of document analysis requires a good …

M6Doc: a large-scale multi-format, multi-type, multi-layout, multi-language, multi-annotation category dataset for modern document layout analysis

H Cheng, P Zhang, S Wu, J Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Document layout analysis is a crucial prerequisite for document understanding, including
document retrieval and conversion. Most public datasets currently contain only PDF …

Text line segmentation for challenging handwritten document images using fully convolutional network

B Barakat, A Droby, M Kassis… - 2018 16th International …, 2018 - ieeexplore.ieee.org
This paper presents a method for text line segmentation of challenging historical manuscript
images. These manuscript images contain narrow interline spaces with touching …

Full-page text recognition: Learning where to start and when to stop

B Moysset, C Kermorvant, C Wolf - 2017 14th IAPR …, 2017 - ieeexplore.ieee.org
Text line detection and localization is a crucial step for full page document analysis, but still
suffers from heterogeneity of real life documents. In this paper, we present a new approach …

Seam carving for text line extraction on color and grayscale historical manuscripts

N Arvanitopoulos, S Süsstrunk - 2014 14th International …, 2014 - ieeexplore.ieee.org
We propose a novel algorithm for automatic text line extraction on color and gray scale
manuscript pages without prior binarization. Our algorithm is based on seam carving to …

Page segmentation for historical handwritten documents using fully convolutional networks

Y Xu, W He, F Yin, CL Liu - 2017 14th IAPR International …, 2017 - ieeexplore.ieee.org
Page segmentation is a fundamental and challenging task in document image analysis due
to the layout diversity. In this work, we propose a pixel-wise segmentation method for …

Multi-task handwritten document layout analysis

L Quirós - arxiv preprint arxiv:1806.08852, 2018 - arxiv.org
Document Layout Analysis is a fundamental step in Handwritten Text Processing systems,
from the extraction of the text lines to the type of zone it belongs to. We present a system …

ADoPD: A large-scale document page decomposition dataset

J Gu, X Shi, J Kuen, L Qi, R Zhang, A Liu… - The Twelfth …, 2024 - openreview.net
Research in document image understanding is hindered by limited high-quality document
data. To address this, we introduce ADOPD, a comprehensive dataset for document page …

Unsupervised wall detector in architectural floor plans

LP De Las Heras, D Fernández… - 2013 12th …, 2013 - ieeexplore.ieee.org
Wall detection in floor plans is a crucial step in a complete floor plan recognition system.
Walls define the main structure of buildings and convey essential information for the …

[HTML][HTML] Learning-free text line segmentation for historical handwritten documents

B Kurar Barakat, R Cohen, A Droby, I Rabaev… - Applied Sciences, 2020 - mdpi.com
We present a learning-free method for text line segmentation of historical handwritten
document images. This method relies on automatic scale selection together with second …