[HTML][HTML] Progress in machine translation

H Wang, H Wu, Z He, L Huang, KW Church - Engineering, 2022 - Elsevier
After more than 70 years of evolution, great achievements have been made in machine
translation. Especially in recent years, translation quality has been greatly improved with the …

Generative artificial intelligence: a systematic review and applications

SS Sengar, AB Hasan, S Kumar, F Carroll - Multimedia Tools and …, 2024 - Springer
In recent years, the study of artificial intelligence (AI) has undergone a paradigm shift. This
has been propelled by the groundbreaking capabilities of generative models both in …

Layoutlmv3: Pre-training for document ai with unified text and image masking

Y Huang, T Lv, L Cui, Y Lu, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Self-supervised pre-training techniques have achieved remarkable progress in Document
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …

Ocr-free document understanding transformer

G Kim, T Hong, M Yim, JY Nam, J Park, J Yim… - … on Computer Vision, 2022 - Springer
Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …

Co-scale conv-attentional image transformers

W Xu, Y Xu, T Chang, Z Tu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a
Transformer-based image classifier equipped with co-scale and conv-attentional …

Expressive text-to-image generation with rich text

S Ge, T Park, JY Zhu, JB Huang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Plain text has become a prevalent interface for text-to-image synthesis. However, its limited
customization options hinder users from accurately describing desired outputs. For example …

Layoutlmv2: Multi-modal pre-training for visually-rich document understanding

Y Xu, Y Xu, T Lv, L Cui, F Wei, G Wang, Y Lu… - arxiv preprint arxiv …, 2020 - arxiv.org
Pre-training of text and layout has proved effective in a variety of visually-rich document
understanding tasks due to its effective model architecture and the advantage of large-scale …

Docformer: End-to-end transformer for document understanding

S Appalaraju, B Jasani, BU Kota… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present DocFormer-a multi-modal transformer based architecture for the task of Visual
Document Understanding (VDU). VDU is a challenging problem which aims to understand …

Unifying vision, text, and layout for universal document processing

Z Tang, Z Yang, G Wang, Y Fang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose Universal Document Processing (UDOP), a foundation Document AI
model which unifies text, image, and layout modalities together with varied task formats …

Dit: Self-supervised pre-training for document image transformer

J Li, Y Xu, T Lv, L Cui, C Zhang, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Image Transformer has recently achieved significant progress for natural image
understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) …