Deep Learning based Visually Rich Document Content Understanding: A Survey
Visually Rich Documents (VRDs) are essential in academia, finance, medical fields, and
marketing due to their multimodal information content. Traditional methods for extracting …
marketing due to their multimodal information content. Traditional methods for extracting …
Bluelm-v-3b: Algorithm and system co-design for multimodal large language models on mobile devices
X Lu, Y Chen, C Chen, H Tan, B Chen, Y **e… - arxiv preprint arxiv …, 2024 - arxiv.org
The emergence and growing popularity of multimodal large language models (MLLMs) have
significant potential to enhance various aspects of daily life, from improving communication …
significant potential to enhance various aspects of daily life, from improving communication …
Privacy-aware document visual question answering
Abstract Document Visual Question Answering (DocVQA) has quickly grown into a central
task of document understanding. But despite the fact that documents contain sensitive or …
task of document understanding. But despite the fact that documents contain sensitive or …
Overview of DocILE 2023: Document Information Localization and Extraction
This paper provides an overview of the DocILE 2023 Competition, its tasks, participant
submissions, the competition results and possible future research directions. This first …
submissions, the competition results and possible future research directions. This first …
Towards a new research agenda for multimodal enterprise document understanding: What are we missing?
The field of multimodal document understanding has produced a suite of models that have
achieved stellar performance across several tasks, even coming close to human …
achieved stellar performance across several tasks, even coming close to human …
Towards reducing hallucination in extracting information from financial reports using Large Language Models
For a financial analyst, the question and answer (Q&A) segment of the company financial
report is a crucial piece of information for various analysis and investment decisions …
report is a crucial piece of information for various analysis and investment decisions …
Beyond Document Page Classification: Design, Datasets, and Challenges
This paper highlights the need to bring document classification benchmarking closer to real-
world applications, both in the nature of data tested (X: multi-channel, multi-paged, multi …
world applications, both in the nature of data tested (X: multi-channel, multi-paged, multi …
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
Large Vision-Language Models (LVLMs) have achieved remarkable performance in many
vision-language tasks, yet their capabilities in fine-grained visual understanding remain …
vision-language tasks, yet their capabilities in fine-grained visual understanding remain …
WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
Multimodal document understanding is a challenging task to process and comprehend large
amounts of textual and visual information. Recent advances in Large Language Models …
amounts of textual and visual information. Recent advances in Large Language Models …
Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods
This paper focuses on Information Extraction from Visually Rich Documents, exploring how
deep learning methods are applied in this field. For the purpose of comparing the …
deep learning methods are applied in this field. For the purpose of comparing the …