- Academic Search

Y Ding, J Lee, SC Han - arxiv preprint arxiv:2408.01287, 2024 - arxiv.org

Visually Rich Documents (VRDs) are essential in academia, finance, medical fields, and
marketing due to their multimodal information content. Traditional methods for extracting …

Salva Cita Citato da 3 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bluelm-v-3b: Algorithm and system co-design for multimodal large language models on mobile devices

X Lu, Y Chen, C Chen, H Tan, B Chen, Y **e… - arxiv preprint arxiv …, 2024 - arxiv.org

The emergence and growing popularity of multimodal large language models (MLLMs) have
significant potential to enhance various aspects of daily life, from improving communication …

Salva Cita Citato da 4 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Privacy-aware document visual question answering

R Tito, K Nguyen, M Tobaben, R Kerkouche… - … on Document Analysis …, 2024 - Springer

Abstract Document Visual Question Answering (DocVQA) has quickly grown into a central
task of document understanding. But despite the fact that documents contain sensitive or …

Salva Cita Citato da 8 Articoli correlati Tutte e 2 le versioni

Overview of DocILE 2023: Document Information Localization and Extraction

Š Šimsa, M Uřičář, M Šulc, Y Patel, A Hamdi… - … Conference of the Cross …, 2023 - Springer

This paper provides an overview of the DocILE 2023 Competition, its tasks, participant
submissions, the competition results and possible future research directions. This first …

Salva Cita Citato da 6 Articoli correlati Tutte e 3 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Towards a new research agenda for multimodal enterprise document understanding: What are we missing?

A Nourbakhsh, S Shah, C Rose - Findings of the Association for …, 2024 - aclanthology.org

The field of multimodal document understanding has produced a suite of models that have
achieved stellar performance across several tasks, even coming close to human …

Salva Cita Articoli correlati Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards reducing hallucination in extracting information from financial reports using Large Language Models

B Sarmah, D Mehta, S Pasquali, T Zhu - Proceedings of the Third …, 2023 - dl.acm.org

For a financial analyst, the question and answer (Q&A) segment of the company financial
report is a crucial piece of information for various analysis and investment decisions …

Salva Cita Citato da 19 Articoli correlati Tutte e 5 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Beyond Document Page Classification: Design, Datasets, and Challenges

J Van Landeghem, S Biswas… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper highlights the need to bring document classification benchmarking closer to real-
world applications, both in the nature of data tested (X: multi-channel, multi-paged, multi …

Salva Cita Citato da 5 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding

F Zhu, Z Liu, XY Ng, H Wu, W Wang, F Feng… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Vision-Language Models (LVLMs) have achieved remarkable performance in many
vision-language tasks, yet their capabilities in fine-grained visual understanding remain …

Salva Cita Citato da 1 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

X **e, H Yan, L Yin, Y Liu, J Ding, M Liao, Y Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal document understanding is a challenging task to process and comprehend large
amounts of textual and visual information. Recent advances in Large Language Models …

Salva Cita Citato da 1 Articoli correlati Tutte e 2 le versioni Versione HTML

Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods

H Gbada, K Kalti, MA Mahjoub - International Journal on Document …, 2024 - Springer

This paper focuses on Information Extraction from Visually Rich Documents, exploring how
deep learning methods are applied in this field. For the purpose of comparing the …

Salva Cita Citato da 1 Articoli correlati

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Docile benchmark for document information localization and extraction

Deep Learning based Visually Rich Document Content Understanding: A Survey

Bluelm-v-3b: Algorithm and system co-design for multimodal large language models on mobile devices

Privacy-aware document visual question answering

Overview of DocILE 2023: Document Information Localization and Extraction

Towards a new research agenda for multimodal enterprise document understanding: What are we missing?

Towards reducing hallucination in extracting information from financial reports using Large Language Models

Beyond Document Page Classification: Design, Datasets, and Challenges

MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding

WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods