[PDF][PDF] MMVQA: A comprehensive dataset for investigating multipage multimodal information retrieval in pdf-based visual question answering

Y Ding, K Ren, J Huang, S Luo, SC Han - Proceedings of the Thirty-Third …, 2024 - ijcai.org
Abstract Document Question Answering (QA) presents a challenge in understanding visually-
rich documents (VRD), particularly with lengthy textual content. Existing studies primarily …

Large Language Models in Finance (FinLLMs)

J Lee, N Stevens, SC Han - Neural Computing and Applications, 2025 - Springer
Large language models (LLMs) have demonstrated remarkable capabilities and have
attracted significant attention across diverse domains, including financial services. Despite …

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

P Lyu, Y Li, H Zhou, W Ma, X Wan, Q **e, L Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-rich images have significant and extensive value, deeply integrated into various
aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images …

AiBAT: Artificial Intelligence/Instructions for Build, Assembly, and Test

B Nuernberger, A Liu, H Stefanini, R Otis… - arxiv preprint arxiv …, 2024 - arxiv.org
Instructions for Build, Assembly, and Test (IBAT) refers to the process used whenever any
operation is conducted on hardware, including tests, assembly, and maintenance. Currently …

MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering

Y Ding, K Ren, J Huang, S Luo, SC Han - arxiv preprint arxiv:2404.12720, 2024 - arxiv.org
Document Question Answering (QA) presents a challenge in understanding visually-rich
documents (VRD), particularly those dominated by lengthy textual content like research …

KVP10k: A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents

O Naparstek, O Azulai, I Shapira, E Amrani… - … on Document Analysis …, 2024 - Springer
In recent years, the challenge of extracting information from business documents has
emerged as a critical task, finding applications across numerous domains. This effort has …

DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights

Y Ding, SC Han, Z Li, H Chung - arxiv preprint arxiv:2410.01609, 2024 - arxiv.org
Visually-Rich Documents (VRDs), encompassing elements like charts, tables, and
references, convey complex information across various fields. However, extracting …

Visually Rich Document Understanding and Intelligence

Y Ding - 2024 - ses.library.usyd.edu.au
Visually Rich Documents (VRDs) are potent carriers of multimodal information widely used
in academia, finance, medical fields, and marketing. Traditional approaches to extracting …

Natural Language Processing in Finance: Applications and Opportunities.

J Lee - 2024 - ses.library.usyd.edu.au
The research of Natural Language Processing (NLP) in Finance has experienced
considerable development driven by academia and industry. However, small benchmark …