- Academic Search

Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction

Q Zhang, VSJ Huang, B Wang, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Document parsing is essential for converting unstructured and semi-structured documents-
such as contracts, academic papers, and invoices-into structured, machine-readable data …

Gem Citer Citeret af 1 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

J Zhang, Q Zhang, B Wang, L Ouyang, Z Wen… - arxiv preprint arxiv …, 2024 - arxiv.org

Retrieval-augmented Generation (RAG) enhances Large Language Models (LLMs) by
integrating external knowledge to reduce hallucinations and incorporate up-to-date …

Gem Citer Citeret af 1 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VISA: Retrieval Augmented Generation with Visual Source Attribution

X Ma, S Zhuang, B Koopman, G Zuccon… - arxiv preprint arxiv …, 2024 - arxiv.org

Generation with source attribution is important for enhancing the verifiability of retrieval-
augmented generation (RAG) systems. However, existing approaches in RAG primarily link …

Gem Citer Citeret af 1 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

UniCoRN: Unified Commented Retrieval Network with LMMs

M Jaritz, M Guillaumin, S Sternig, L Bazzani - arxiv preprint arxiv …, 2025 - arxiv.org

Multimodal retrieval methods have limitations in handling complex, compositional queries
that require reasoning about the visual content of both the query and the retrieved entities …

Gem Citer Relaterede artikler Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

J Zhou, Z Liu, Z Liu, S **ao, Y Wang, B Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite the rapidly growing demand for multimodal retrieval, progress in this field remains
severely constrained by a lack of training data. In this paper, we introduce MegaPairs, a …

Gem Citer Citeret af 1 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks

S Zhuang, E Khramtsova, X Ma, B Koopman… - arxiv preprint arxiv …, 2025 - arxiv.org

Recent advancements in dense retrieval have introduced vision-language model (VLM)-
based retrievers, such as DSE and ColPali, which leverage document screenshots …

Gem Citer Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating

C Deng, J Yuan, P Bu, P Wang, ZZ Li, J Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large vision language models (LVLMs) have improved the document understanding
capabilities remarkably, enabling the handling of complex document elements, longer …

Gem Citer Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos

X Ren, L Xu, L **a, S Wang, D Yin, C Huang - arxiv preprint arxiv …, 2025 - arxiv.org

Retrieval-Augmented Generation (RAG) has demonstrated remarkable success in
enhancing Large Language Models (LLMs) through external knowledge integration, yet its …

Gem Citer Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey

Y Qi, H Li, Y Song, X Wu, J Luo - arxiv preprint arxiv:2412.08158, 2024 - arxiv.org

The exploration of various vision-language tasks, such as visual captioning, visual question
answering, and visual commonsense reasoning, is an important area in artificial intelligence …

Gem Citer Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An archaeological Catalog Collection Method Based on Large Vision-Language Models

H Pang, Y Chang, T Duan, X Yang - arxiv preprint arxiv:2412.20088, 2024 - arxiv.org

Archaeological catalogs, containing key elements such as artifact images, morphological
descriptions, and excavation information, are essential for studying artifact evolution and …

Gem Citer Relaterede artikler Alle 2 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Visrag: Vision-based retrieval-augmented generation on multi-modality documents

Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

VISA: Retrieval Augmented Generation with Visual Source Attribution

UniCoRN: Unified Commented Retrieval Network with LMMs

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks

LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating

VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos

How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey

An archaeological Catalog Collection Method Based on Large Vision-Language Models