VQA: A new dataset for real-world VQA on PDF documents

Y Ding, S Luo, H Chung, SC Han - Joint European Conference on …, 2023 - Springer
Abstract Document-based Visual Question Answering examines the document
understanding of document images in conditions of natural language questions. We …

A survey of recent approaches to form understanding in scanned documents

A Abdallah, D Eberharter, Z Pfister, A Jatowt - Artificial Intelligence Review, 2024 - Springer
This paper presents a comprehensive survey of over 100 research works on the topic of form
understanding in the context of scanned documents. We delve into recent advancements …

Towards Multi-modal Interpretation and Explanation

S Luo - 2023 - ses.library.usyd.edu.au
Multimodal task processes on different modalities simultaneously. Visual Question
Answering, as a type of multimodal task, aims to answer the natural question answering …