Multimodal data integration for oncology in the era of deep neural networks: a review

A Waqas, A Tripathi, RP Ramachandran… - Frontiers in Artificial …, 2024 - frontiersin.org
Cancer research encompasses data across various scales, modalities, and resolutions, from
screening and diagnostic imaging to digitized histopathology slides to various types of …

Layoutlmv3: Pre-training for document ai with unified text and image masking

Y Huang, T Lv, L Cui, Y Lu, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Self-supervised pre-training techniques have achieved remarkable progress in Document
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …

Self-supervised multimodal learning: A survey

Y Zong, O Mac Aodha, T Hospedales - arxiv preprint arxiv:2304.01008, 2023 - arxiv.org
Multimodal learning, which aims to understand and analyze information from multiple
modalities, has achieved substantial progress in the supervised regime in recent years …

Ocr-free document understanding transformer

G Kim, T Hong, M Yim, JY Nam, J Park, J Yim… - … on Computer Vision, 2022 - Springer
Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …

Unifying vision, text, and layout for universal document processing

Z Tang, Z Yang, G Wang, Y Fang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose Universal Document Processing (UDOP), a foundation Document AI
model which unifies text, image, and layout modalities together with varied task formats …

A real-world webagent with planning, long context understanding, and program synthesis

I Gur, H Furuta, A Huang, M Safdari, Y Matsuo… - arxiv preprint arxiv …, 2023 - arxiv.org
Pre-trained large language models (LLMs) have recently achieved better generalization and
sample efficiency in autonomous web navigation. However, the performance on real-world …

Dit: Self-supervised pre-training for document image transformer

J Li, Y Xu, T Lv, L Cui, C Zhang, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Image Transformer has recently achieved significant progress for natural image
understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) …

Lilt: A simple yet effective language-independent layout transformer for structured document understanding

J Wang, L **, K Ding - arxiv preprint arxiv:2202.13669, 2022 - arxiv.org
Structured document understanding has attracted considerable attention and made
significant progress recently, owing to its crucial role in intelligent document processing …

Geolayoutlm: Geometric pre-training for visual information extraction

C Luo, C Cheng, Q Zheng… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Visual information extraction (VIE) plays an important role in Document Intelligence.
Generally, it is divided into two tasks: semantic entity recognition (SER) and relation …

Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents

T Hong, D Kim, M Ji, W Hwang, D Nam… - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Key information extraction (KIE) from document images requires understanding the
contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent …