Robust text line detection in historical documents: learning and evaluation methods

M Boillet, C Kermorvant, T Paquet - International Journal on Document …, 2022 - Springer
Text line segmentation is one of the key steps in historical document understanding. It is
challenging due to the variety of fonts, contents, writing styles and the quality of documents …

Assessing the impact of OCR noise on multilingual event detection over digitised documents

E Boros, NK Nguyen, G Lejeune, A Doucet - International Journal on …, 2022 - Springer
Event detection is a crucial task for natural language processing and it involves the
identification of instances of specified types of events in text and their classification into …

Matchvie: Exploiting match relevancy between entities for visual information extraction

G Tang, L **e, L **, J Wang, J Chen, Z Xu… - arxiv preprint arxiv …, 2021 - arxiv.org
Visual Information Extraction (VIE) task aims to extract key information from multifarious
document images (eg, invoices and purchase receipts). Most previous methods treat the VIE …

Intelligent document processing in end-to-end RPA contexts: a systematic literature review

A Martínez-Rojas, JM López-Carnicer… - Confluence of Artificial …, 2023 - Springer
Automating organizational processes typically involves document processing techniques for
a large document set. For that purpose, the Intelligent Document Processing (IDP) paradigm …

MELHISSA: a multilingual entity linking architecture for historical press articles

E Linhares Pontes, LA Cabrera-Diego… - International journal on …, 2022 - Springer
Digital libraries have a key role in cultural heritage as they provide access to our culture and
history by indexing books and historical documents (newspapers and letters). Digital …

A comprehensive study of open-source libraries for named entity recognition on handwritten historical documents

CB Monroc, B Miret, ML Bonhomme… - … Workshop on Document …, 2022 - Springer
In this paper, we propose an evaluation of several state-of-the-art open-source natural
language processing (NLP) libraries for named entity recognition (NER) on handwritten …

Injecting temporal-aware knowledge in historical named entity recognition

CE González-Gallardo, E Boros, E Giamphy… - … on Information Retrieval, 2023 - Springer
In this paper, we address the detection of named entities in multilingual historical collections.
We argue that, besides the multiple challenges that depend on the quality of digitization (eg …

Are end-to-end systems really necessary for NER on handwritten document images?

O Tüselmann, F Wolf, GA Fink - International Conference on Document …, 2021 - Springer
Named entities (NEs) are fundamental in the extraction of information from text. The
recognition and classification of these entities into predefined categories is called Named …

DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents

T Constum, P Tranouez, T Paquet - International Journal on Document …, 2025 - Springer
Abstract Information extraction from handwritten documents involves traditionally three
distinct steps: Document Layout Analysis, Handwritten Text Recognition, and Named Entity …

Neural models for semantic analysis of handwritten document images

O Tüselmann, GA Fink - International Journal on Document Analysis and …, 2024 - Springer
Semantic analysis of handwritten document images offers a wide range of practical
application scenarios. A sequential combination of handwritten text recognition (HTR) and a …