[HTML][HTML] Progress in machine translation

H Wang, H Wu, Z He, L Huang, KW Church - Engineering, 2022 - Elsevier
After more than 70 years of evolution, great achievements have been made in machine
translation. Especially in recent years, translation quality has been greatly improved with the …

A survey of graph neural networks in various learning paradigms: methods, applications, and challenges

L Waikhom, R Patgiri - Artificial Intelligence Review, 2023 - Springer
In the last decade, deep learning has reinvigorated the machine learning field. It has solved
many problems in computer vision, speech recognition, natural language processing, and …

Layoutlmv3: Pre-training for document ai with unified text and image masking

Y Huang, T Lv, L Cui, Y Lu, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Self-supervised pre-training techniques have achieved remarkable progress in Document
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …

Ocr-free document understanding transformer

G Kim, T Hong, M Yim, JY Nam, J Park, J Yim… - … on Computer Vision, 2022 - Springer
Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …

Layoutllm: Layout instruction tuning with large language models for document understanding

C Luo, Y Shen, Z Zhu, Q Zheng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recently leveraging large language models (LLMs) or multimodal large language models
(MLLMs) for document understanding has been proven very promising. However previous …

Expressive text-to-image generation with rich text

S Ge, T Park, JY Zhu, JB Huang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Plain text has become a prevalent interface for text-to-image synthesis. However, its limited
customization options hinder users from accurately describing desired outputs. For example …

Co-scale conv-attentional image transformers

W Xu, Y Xu, T Chang, Z Tu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a
Transformer-based image classifier equipped with co-scale and conv-attentional …

mplug-docowl: Modularized multimodal large language model for document understanding

J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org
Document understanding refers to automatically extract, analyze and comprehend
information from various types of digital documents, such as a web page. Existing Multi …

Textmonkey: An ocr-free large multimodal model for understanding document

Y Liu, B Yang, Q Liu, Z Li, Z Ma, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks. Our
approach introduces enhancement across several dimensions: By adopting Shifted Window …

Nougat: Neural optical understanding for academic documents

L Blecher, G Cucurull, T Scialom, R Stojnic - arxiv preprint arxiv …, 2023 - arxiv.org
Scientific knowledge is predominantly stored in books and scientific journals, often in the
form of PDFs. However, the PDF format leads to a loss of semantic information, particularly …