Google znalac

C Qin, L Zhang, Y Cheng, R Zha, D Shen… - arxiv preprint arxiv …, 2023 - arxiv.org

In today's competitive and fast-evolving business environment, it is a critical time for
organizations to rethink how to make talent-related decisions in a quantitative manner …

Spremi Citiraj Spominje se 42 puta Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Pix2struct: Screenshot parsing as pretraining for visual language understanding

K Lee, M Joshi, IR Turc, H Hu, F Liu… - International …, 2023 - proceedings.mlr.press

Visually-situated language is ubiquitous—sources range from textbooks with diagrams to
web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to …

Spremi Citiraj Spominje se 264 puta Srodni članci Svih 7 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ureader: Universal ocr-free visually-situated language understanding with multimodal large language model

J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Text is ubiquitous in our visual world, conveying crucial information, such as in documents,
websites, and everyday photographs. In this work, we propose UReader, a first exploration …

Spremi Citiraj Spominje se 120 puta Srodni članci Svih 6 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Layoutllm: Layout instruction tuning with large language models for document understanding

C Luo, Y Shen, Z Zhu, Q Zheng… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recently leveraging large language models (LLMs) or multimodal large language models
(MLLMs) for document understanding has been proven very promising. However previous …

Spremi Citiraj Spominje se 34 puta Srodni članci Svih 8 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

mplug-docowl 1.5: Unified structure learning for ocr-free document understanding

A Hu, H Xu, J Ye, M Yan, L Zhang, B Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Structure information is critical for understanding the semantics of text-rich images, such as
documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for …

Spremi Citiraj Spominje se 87 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

mplug-docowl: Modularized multimodal large language model for document understanding

J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org

Document understanding refers to automatically extract, analyze and comprehend
information from various types of digital documents, such as a web page. Existing Multi …

Spremi Citiraj Spominje se 114 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Textmonkey: An ocr-free large multimodal model for understanding document

Y Liu, B Yang, Q Liu, Z Li, Z Ma, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks. Our
approach introduces enhancement across several dimensions: By adopting Shifted Window …

Spremi Citiraj Spominje se 84 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unifying vision, text, and layout for universal document processing

Z Tang, Z Yang, G Wang, Y Fang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We propose Universal Document Processing (UDOP), a foundation Document AI
model which unifies text, image, and layout modalities together with varied task formats …

Spremi Citiraj Spominje se 101 puta Srodni članci Svih 7 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dit: Self-supervised pre-training for document image transformer

J Li, Y Xu, T Lv, L Cui, C Zhang, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org

Image Transformer has recently achieved significant progress for natural image
understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) …

Spremi Citiraj Spominje se 172 puta Srodni članci Svih 4 inačica

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Geolayoutlm: Geometric pre-training for visual information extraction

C Luo, C Cheng, Q Zheng… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Visual information extraction (VIE) plays an important role in Document Intelligence.
Generally, it is divided into two tasks: semantic entity recognition (SER) and relation …

Spremi Citiraj Spominje se 63 puta Srodni članci Svih 6 inačica Prikaži kao HTML

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Layoutlmv3: Pre-training for document ai with unified text and image masking

A comprehensive survey of artificial intelligence techniques for talent analytics

Pix2struct: Screenshot parsing as pretraining for visual language understanding

Ureader: Universal ocr-free visually-situated language understanding with multimodal large language model

Layoutllm: Layout instruction tuning with large language models for document understanding

mplug-docowl 1.5: Unified structure learning for ocr-free document understanding

mplug-docowl: Modularized multimodal large language model for document understanding

Textmonkey: An ocr-free large multimodal model for understanding document

Unifying vision, text, and layout for universal document processing

Dit: Self-supervised pre-training for document image transformer

Geolayoutlm: Geometric pre-training for visual information extraction