- Academic Search

[PDF][PDF] MMVQA: A comprehensive dataset for investigating multipage multimodal information retrieval in pdf-based visual question answering

Y Ding, K Ren, J Huang, S Luo, SC Han - Proceedings of the Thirty-Third …, 2024 - ijcai.org

Abstract Document Question Answering (QA) presents a challenge in understanding visually-
rich documents (VRD), particularly with lengthy textual content. Existing studies primarily …

Save Cite Cited by 2 Related articles All 3 versions Free GPT-4 View as HTML

Large Language Models in Finance (FinLLMs)

J Lee, N Stevens, SC Han - Neural Computing and Applications, 2025 - Springer

Large language models (LLMs) have demonstrated remarkable capabilities and have
attracted significant attention across diverse domains, including financial services. Despite …

Save Cite Related articles

[Free GPT-4]

[PDF] arxiv.org

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

P Lyu, Y Li, H Zhou, W Ma, X Wan, Q **e, L Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Text-rich images have significant and extensive value, deeply integrated into various
aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

AiBAT: Artificial Intelligence/Instructions for Build, Assembly, and Test

B Nuernberger, A Liu, H Stefanini, R Otis… - arxiv preprint arxiv …, 2024 - arxiv.org

Instructions for Build, Assembly, and Test (IBAT) refers to the process used whenever any
operation is conducted on hardware, including tests, assembly, and maintenance. Currently …

[Free GPT-4]

[PDF] arxiv.org

MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering

Y Ding, K Ren, J Huang, S Luo, SC Han - arxiv preprint arxiv:2404.12720, 2024 - arxiv.org

Document Question Answering (QA) presents a challenge in understanding visually-rich
documents (VRD), particularly those dominated by lengthy textual content like research …

Save Cite Cited by 5 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

KVP10k: A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents

O Naparstek, O Azulai, I Shapira, E Amrani… - … on Document Analysis …, 2024 - Springer

In recent years, the challenge of extracting information from business documents has
emerged as a critical task, finding applications across numerous domains. This effort has …

[Free GPT-4]

[PDF] arxiv.org

DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights

Y Ding, SC Han, Z Li, H Chung - arxiv preprint arxiv:2410.01609, 2024 - arxiv.org

Visually-Rich Documents (VRDs), encompassing elements like charts, tables, and
references, convey complex information across various fields. However, extracting …

Visually Rich Document Understanding and Intelligence

Y Ding - 2024 - ses.library.usyd.edu.au

Visually Rich Documents (VRDs) are potent carriers of multimodal information widely used
in academia, finance, medical fields, and marketing. Traditional approaches to extracting …

Save Cite Related articles Cached

Natural Language Processing in Finance: Applications and Opportunities.

J Lee - 2024 - ses.library.usyd.edu.au

The research of Natural Language Processing (NLP) in Finance has experienced
considerable development driven by academia and industry. However, small benchmark …

Save Cite Related articles Cached

Create alert

Cite

Advanced search

Saved to My library

Form-NLU: Dataset for the Form Natural Language Understanding

[PDF][PDF] MMVQA: A comprehensive dataset for investigating multipage multimodal information retrieval in pdf-based visual question answering

Large Language Models in Finance (FinLLMs)

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

AiBAT: Artificial Intelligence/Instructions for Build, Assembly, and Test

MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering

KVP10k: A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents

DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights

Visually Rich Document Understanding and Intelligence

Natural Language Processing in Finance: Applications and Opportunities.