- Academic Search

A Waqas, A Tripathi, RP Ramachandran… - Frontiers in Artificial …, 2024 - frontiersin.org

Cancer research encompasses data across various scales, modalities, and resolutions, from
screening and diagnostic imaging to digitized histopathology slides to various types of …

Gem Citer Citeret af 26 Relaterede artikler Alle 2 versioner Cached

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Layoutlmv3: Pre-training for document ai with unified text and image masking

Y Huang, T Lv, L Cui, Y Lu, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org

Self-supervised pre-training techniques have achieved remarkable progress in Document
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …

Gem Citer Citeret af 476 Relaterede artikler Alle 3 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised multimodal learning: A survey

Y Zong, O Mac Aodha, T Hospedales - arxiv preprint arxiv:2304.01008, 2023 - arxiv.org

Multimodal learning, which aims to understand and analyze information from multiple
modalities, has achieved substantial progress in the supervised regime in recent years …

Gem Citer Citeret af 38 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ocr-free document understanding transformer

G Kim, T Hong, M Yim, JY Nam, J Park, J Yim… - … on Computer Vision, 2022 - Springer

Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …

Gem Citer Citeret af 380 Relaterede artikler Alle 6 versioner

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unifying vision, text, and layout for universal document processing

Z Tang, Z Yang, G Wang, Y Fang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We propose Universal Document Processing (UDOP), a foundation Document AI
model which unifies text, image, and layout modalities together with varied task formats …

Gem Citer Citeret af 102 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A real-world webagent with planning, long context understanding, and program synthesis

I Gur, H Furuta, A Huang, M Safdari, Y Matsuo… - arxiv preprint arxiv …, 2023 - arxiv.org

Pre-trained large language models (LLMs) have recently achieved better generalization and
sample efficiency in autonomous web navigation. However, the performance on real-world …

Gem Citer Citeret af 184 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dit: Self-supervised pre-training for document image transformer

J Li, Y Xu, T Lv, L Cui, C Zhang, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org

Image Transformer has recently achieved significant progress for natural image
understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) …

Gem Citer Citeret af 171 Relaterede artikler Alle 4 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lilt: A simple yet effective language-independent layout transformer for structured document understanding

J Wang, L **, K Ding - arxiv preprint arxiv:2202.13669, 2022 - arxiv.org

Structured document understanding has attracted considerable attention and made
significant progress recently, owing to its crucial role in intelligent document processing …

Gem Citer Citeret af 153 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Geolayoutlm: Geometric pre-training for visual information extraction

C Luo, C Cheng, Q Zheng… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Visual information extraction (VIE) plays an important role in Document Intelligence.
Generally, it is divided into two tasks: semantic entity recognition (SER) and relation …

Gem Citer Citeret af 60 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents

T Hong, D Kim, M Ji, W Hwang, D Nam… - Proceedings of the AAAI …, 2022 - ojs.aaai.org

Key information extraction (KIE) from document images requires understanding the
contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent …

Gem Citer Citeret af 182 Relaterede artikler Alle 8 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Selfdoc: Self-supervised document representation learning

Multimodal data integration for oncology in the era of deep neural networks: a review

Layoutlmv3: Pre-training for document ai with unified text and image masking

Self-supervised multimodal learning: A survey

Ocr-free document understanding transformer

Unifying vision, text, and layout for universal document processing

A real-world webagent with planning, long context understanding, and program synthesis

Dit: Self-supervised pre-training for document image transformer

Lilt: A simple yet effective language-independent layout transformer for structured document understanding

Geolayoutlm: Geometric pre-training for visual information extraction

Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents