Multimodal data integration for oncology in the era of deep neural networks: a review
Cancer research encompasses data across various scales, modalities, and resolutions, from
screening and diagnostic imaging to digitized histopathology slides to various types of …
screening and diagnostic imaging to digitized histopathology slides to various types of …
Layoutlmv3: Pre-training for document ai with unified text and image masking
Self-supervised pre-training techniques have achieved remarkable progress in Document
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …
Self-supervised multimodal learning: A survey
Multimodal learning, which aims to understand and analyze information from multiple
modalities, has achieved substantial progress in the supervised regime in recent years …
modalities, has achieved substantial progress in the supervised regime in recent years …
Ocr-free document understanding transformer
Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …
requires complex functions such as reading text and a holistic understanding of the …
Unifying vision, text, and layout for universal document processing
Abstract We propose Universal Document Processing (UDOP), a foundation Document AI
model which unifies text, image, and layout modalities together with varied task formats …
model which unifies text, image, and layout modalities together with varied task formats …
A real-world webagent with planning, long context understanding, and program synthesis
Pre-trained large language models (LLMs) have recently achieved better generalization and
sample efficiency in autonomous web navigation. However, the performance on real-world …
sample efficiency in autonomous web navigation. However, the performance on real-world …
Dit: Self-supervised pre-training for document image transformer
Image Transformer has recently achieved significant progress for natural image
understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) …
understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) …
Lilt: A simple yet effective language-independent layout transformer for structured document understanding
Structured document understanding has attracted considerable attention and made
significant progress recently, owing to its crucial role in intelligent document processing …
significant progress recently, owing to its crucial role in intelligent document processing …
Geolayoutlm: Geometric pre-training for visual information extraction
Visual information extraction (VIE) plays an important role in Document Intelligence.
Generally, it is divided into two tasks: semantic entity recognition (SER) and relation …
Generally, it is divided into two tasks: semantic entity recognition (SER) and relation …
Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents
Key information extraction (KIE) from document images requires understanding the
contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent …
contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent …