[HTML][HTML] Progress in machine translation
After more than 70 years of evolution, great achievements have been made in machine
translation. Especially in recent years, translation quality has been greatly improved with the …
translation. Especially in recent years, translation quality has been greatly improved with the …
Generative artificial intelligence: a systematic review and applications
In recent years, the study of artificial intelligence (AI) has undergone a paradigm shift. This
has been propelled by the groundbreaking capabilities of generative models both in …
has been propelled by the groundbreaking capabilities of generative models both in …
Layoutlmv3: Pre-training for document ai with unified text and image masking
Self-supervised pre-training techniques have achieved remarkable progress in Document
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …
Ocr-free document understanding transformer
Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …
requires complex functions such as reading text and a holistic understanding of the …
Co-scale conv-attentional image transformers
In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a
Transformer-based image classifier equipped with co-scale and conv-attentional …
Transformer-based image classifier equipped with co-scale and conv-attentional …
Expressive text-to-image generation with rich text
Plain text has become a prevalent interface for text-to-image synthesis. However, its limited
customization options hinder users from accurately describing desired outputs. For example …
customization options hinder users from accurately describing desired outputs. For example …
Layoutlmv2: Multi-modal pre-training for visually-rich document understanding
Pre-training of text and layout has proved effective in a variety of visually-rich document
understanding tasks due to its effective model architecture and the advantage of large-scale …
understanding tasks due to its effective model architecture and the advantage of large-scale …
Docformer: End-to-end transformer for document understanding
We present DocFormer-a multi-modal transformer based architecture for the task of Visual
Document Understanding (VDU). VDU is a challenging problem which aims to understand …
Document Understanding (VDU). VDU is a challenging problem which aims to understand …
Unifying vision, text, and layout for universal document processing
Abstract We propose Universal Document Processing (UDOP), a foundation Document AI
model which unifies text, image, and layout modalities together with varied task formats …
model which unifies text, image, and layout modalities together with varied task formats …
Dit: Self-supervised pre-training for document image transformer
Image Transformer has recently achieved significant progress for natural image
understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) …
understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) …