- Academic Search

H Wang, H Wu, Z He, L Huang, KW Church - Engineering, 2022 - Elsevier

After more than 70 years of evolution, great achievements have been made in machine
translation. Especially in recent years, translation quality has been greatly improved with the …

Lưu Trích dẫn Trích dẫn 230 bài viết Bài viết có liên quan Tất cả 2 phiên bản

A survey of graph neural networks in various learning paradigms: methods, applications, and challenges

L Waikhom, R Patgiri - Artificial Intelligence Review, 2023 - Springer

In the last decade, deep learning has reinvigorated the machine learning field. It has solved
many problems in computer vision, speech recognition, natural language processing, and …

Lưu Trích dẫn Trích dẫn 87 bài viết Bài viết có liên quan Tất cả 5 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Layoutlmv3: Pre-training for document ai with unified text and image masking

Y Huang, T Lv, L Cui, Y Lu, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org

Self-supervised pre-training techniques have achieved remarkable progress in Document
AI. Most multimodal pre-trained models use a masked language modeling objective to learn …

Lưu Trích dẫn Trích dẫn 477 bài viết Bài viết có liên quan Tất cả 3 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ocr-free document understanding transformer

G Kim, T Hong, M Yim, JY Nam, J Park, J Yim… - … on Computer Vision, 2022 - Springer

Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …

Lưu Trích dẫn Trích dẫn 383 bài viết Bài viết có liên quan Tất cả 7 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Layoutllm: Layout instruction tuning with large language models for document understanding

C Luo, Y Shen, Z Zhu, Q Zheng… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recently leveraging large language models (LLMs) or multimodal large language models
(MLLMs) for document understanding has been proven very promising. However previous …

Lưu Trích dẫn Trích dẫn 34 bài viết Bài viết có liên quan Tất cả 8 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Expressive text-to-image generation with rich text

S Ge, T Park, JY Zhu, JB Huang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Plain text has become a prevalent interface for text-to-image synthesis. However, its limited
customization options hinder users from accurately describing desired outputs. For example …

Lưu Trích dẫn Trích dẫn 73 bài viết Bài viết có liên quan Tất cả 6 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Co-scale conv-attentional image transformers

W Xu, Y Xu, T Chang, Z Tu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a
Transformer-based image classifier equipped with co-scale and conv-attentional …

Lưu Trích dẫn Trích dẫn 435 bài viết Bài viết có liên quan Tất cả 6 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

mplug-docowl: Modularized multimodal large language model for document understanding

J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org

Document understanding refers to automatically extract, analyze and comprehend
information from various types of digital documents, such as a web page. Existing Multi …

Lưu Trích dẫn Trích dẫn 114 bài viết Bài viết có liên quan Tất cả 3 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Textmonkey: An ocr-free large multimodal model for understanding document

Y Liu, B Yang, Q Liu, Z Li, Z Ma, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks. Our
approach introduces enhancement across several dimensions: By adopting Shifted Window …

Lưu Trích dẫn Trích dẫn 84 bài viết Bài viết có liên quan Tất cả 3 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Nougat: Neural optical understanding for academic documents

L Blecher, G Cucurull, T Scialom, R Stojnic - arxiv preprint arxiv …, 2023 - arxiv.org

Scientific knowledge is predominantly stored in books and scientific journals, often in the
form of PDFs. However, the PDF format leads to a loss of semantic information, particularly …

Lưu Trích dẫn Trích dẫn 97 bài viết Bài viết có liên quan Tất cả 8 phiên bản Xem dạng HTML

Tạo thông báo

Trích dẫn

Tìm kiếm nâng cao

Đã lưu vào Thư viện của tôi

Layoutlm: Pre-training of text and layout for document image understanding

[HTML][HTML] Progress in machine translation

A survey of graph neural networks in various learning paradigms: methods, applications, and challenges

Layoutlmv3: Pre-training for document ai with unified text and image masking

Ocr-free document understanding transformer

Layoutllm: Layout instruction tuning with large language models for document understanding

Expressive text-to-image generation with rich text

Co-scale conv-attentional image transformers

mplug-docowl: Modularized multimodal large language model for document understanding

Textmonkey: An ocr-free large multimodal model for understanding document

Nougat: Neural optical understanding for academic documents