Attention where it matters: Rethinking visual document understanding with selective region concentration

H Cao, C Bao, C Liu, H Chen, K Yin… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
We propose a novel end-to-end document understanding model called SeRum (SElective
Region Understanding Model) for extracting meaningful information from document images …

You can even annotate text with voice: Transcription-only-supervised text spotting

J Tang, S Qiao, B Cui, Y Ma, S Zhang… - Proceedings of the 30th …, 2022‏ - dl.acm.org
End-to-end scene text spotting has recently gained great attention in the research
community. The majority of existing methods rely heavily on the location annotations of text …

Filling in the blank: Rationale-augmented prompt tuning for TextVQA

G Zeng, Y Zhang, Y Zhou, B Fang, G Zhao… - Proceedings of the 31st …, 2023‏ - dl.acm.org
Recently, generative Text-based visual question answering (TextVQA) methods, which are
often based on language models, have exhibited impressive results and drawn increasing …

ICDAR 2023 competition on structured text extraction from visually-rich document images

W Yu, C Zhang, H Cao, W Hua, B Li, H Chen… - … on Document Analysis …, 2023‏ - Springer
Structured text extraction is one of the most valuable and challenging application directions
in the field of Document AI. However, the scenarios of past benchmarks are limited, and the …

Query-driven generative network for document information extraction in the wild

H Cao, X Li, J Ma, D Jiang, A Guo, Y Hu, H Liu… - Proceedings of the 30th …, 2022‏ - dl.acm.org
This paper focuses on solving Document Information Extraction (DIE) in the wild problem,
which is rarely explored before. In contrast to existing studies mainly tailored for document …

Lapdoc: Layout-aware prompting for documents

M Lamott, YN Weweler, A Ulges, F Shafait… - … on Document Analysis …, 2024‏ - Springer
Recent advances in training large language models (LLMs) using massive amounts of
solely textual data lead to strong generalization across many domains and tasks, including …

Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review

A Rombach, P Fettke - arxiv preprint arxiv:2408.06345, 2024‏ - arxiv.org
Extracting key information from documents represents a large portion of business workloads
and therefore offers a high potential for efficiency improvements and process automation …

Document information extraction via global tagging

S He, T Wang, Y Lu, H Lin, X Han, Y Sun… - … National Conference on …, 2023‏ - Springer
Abstract Document Information Extraction (DIE) is a crucial task for extracting key information
from visually-rich documents. The typical pipeline approach for this task involves Optical …

GenTC: Generative Transformer via Contrastive Learning for Receipt Information Extraction

X Deng, Z Huang, K Ma, K Chen, J Guo… - … Conference on Artificial …, 2023‏ - Springer
Abstract Information Extraction from visually rich documents has attracted increasing
attention due to its various advanced applications in the real world. Most existing methods …

First-place Solution for Streetscape Shop Sign Recognition Competition

B Wang, L **g - arxiv preprint arxiv:2501.02811, 2025‏ - arxiv.org
Text recognition technology applied to street-view storefront signs is increasingly utilized
across various practical domains, including map navigation, smart city planning analysis …