Google Akademik

Kaydet Alıntı yap Alıntılanma sayısı: 40 İlgili makaleler

[PDF] springer.com

Generative artificial intelligence: a systematic review and applications

SS Sengar, AB Hasan, S Kumar, F Carroll - Multimedia Tools and …, 2024 - Springer

In recent years, the study of artificial intelligence (AI) has undergone a paradigm shift. This
has been propelled by the groundbreaking capabilities of generative models both in …

Kaydet Alıntı yap Alıntılanma sayısı: 471 İlgili makaleler 3 sürümün hepsi

Layoutlmv3: Pre-training for document ai with unified text and image masking

Y Huang, T Lv, L Cui, Y Lu, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org

Self-supervised pre-training techniques have achieved remarkable progress in Document
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …

Kaydet Alıntı yap Alıntılanma sayısı: 376 İlgili makaleler 6 sürümün hepsi

Ocr-free document understanding transformer

G Kim, T Hong, M Yim, JY Nam, J Park, J Yim… - … on Computer Vision, 2022 - Springer

Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …

Kaydet Alıntı yap Alıntılanma sayısı: 432 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

Co-scale conv-attentional image transformers

W Xu, Y Xu, T Chang, Z Tu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a
Transformer-based image classifier equipped with co-scale and conv-attentional …

Kaydet Alıntı yap Alıntılanma sayısı: 75 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

Expressive text-to-image generation with rich text

S Ge, T Park, JY Zhu, JB Huang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Plain text has become a prevalent interface for text-to-image synthesis. However, its limited
customization options hinder users from accurately describing desired outputs. For example …

Kaydet Alıntı yap Alıntılanma sayısı: 564 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

Layoutlmv2: Multi-modal pre-training for visually-rich document understanding

Y Xu, Y Xu, T Lv, L Cui, F Wei, G Wang, Y Lu… - arxiv preprint arxiv …, 2020 - arxiv.org

Pre-training of text and layout has proved effective in a variety of visually-rich document
understanding tasks due to its effective model architecture and the advantage of large-scale …

Kaydet Alıntı yap Alıntılanma sayısı: 318 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

Docformer: End-to-end transformer for document understanding

S Appalaraju, B Jasani, BU Kota… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present DocFormer-a multi-modal transformer based architecture for the task of Visual
Document Understanding (VDU). VDU is a challenging problem which aims to understand …

Kaydet Alıntı yap Alıntılanma sayısı: 101 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

Unifying vision, text, and layout for universal document processing

Z Tang, Z Yang, G Wang, Y Fang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We propose Universal Document Processing (UDOP), a foundation Document AI
model which unifies text, image, and layout modalities together with varied task formats …