Google Академик

[HTML][HTML] Summary of chatgpt-related research and perspective towards the future of large language models

Y Liu, T Han, S Ma, J Zhang, Y Yang, J Tian, H He, A Li… - Meta-radiology, 2023 - Elsevier

This paper presents a comprehensive survey of ChatGPT-related (GPT-3.5 and GPT-4)
research, state-of-the-art large language models (LLM) from the GPT series, and their …

Сачувај Цитирај 1019 пута наведен Сродни чланци Све верзије (4)

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Tools, techniques, datasets and application areas for object detection in an image: a review

J Kaur, W Singh - Multimedia Tools and Applications, 2022 - Springer

Object detection is one of the most fundamental and challenging tasks to locate objects in
images and videos. Over the past, it has gained much attention to do more research on …

Сачувај Цитирај 148 пута наведен Сродни чланци Све верзије (8)

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

P Xu, W Shao, K Zhang, P Gao, S Liu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …

Сачувај Цитирај 183 пута наведен Сродни чланци Све верзије (6)

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Bliva: A simple multimodal llm for better handling of text-rich visual questions

W Hu, Y Xu, Y Li, W Li, Z Chen, Z Tu - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Vision Language Models (VLMs), which extend Large Language Models (LLM) by
incorporating visual understanding capability, have demonstrated significant advancements …

Сачувај Цитирај 139 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ocr-free document understanding transformer

G Kim, T Hong, M Yim, JY Nam, J Park, J Yim… - … on Computer Vision, 2022 - Springer

Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …

Сачувај Цитирај 386 пута наведен Сродни чланци Све верзије (7)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Layoutllm: Layout instruction tuning with large language models for document understanding

C Luo, Y Shen, Z Zhu, Q Zheng… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recently leveraging large language models (LLMs) or multimodal large language models
(MLLMs) for document understanding has been proven very promising. However previous …

Сачувај Цитирај 36 пута наведен Сродни чланци Све верзије (8) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Textmonkey: An ocr-free large multimodal model for understanding document

Y Liu, B Yang, Q Liu, Z Li, Z Ma, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks. Our
approach introduces enhancement across several dimensions: By adopting Shifted Window …

Сачувај Цитирај 85 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mmt-bench: A comprehensive multimodal benchmark for evaluating large vision-language models towards multitask agi

K Ying, F Meng, J Wang, Z Li, H Lin, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Vision-Language Models (LVLMs) show significant strides in general-purpose
multimodal applications such as visual dialogue and embodied navigation. However …

Сачувај Цитирај 61 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Docformer: End-to-end transformer for document understanding

S Appalaraju, B Jasani, BU Kota… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present DocFormer-a multi-modal transformer based architecture for the task of Visual
Document Understanding (VDU). VDU is a challenging problem which aims to understand …

Сачувај Цитирај 324 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Layoutlmv2: Multi-modal pre-training for visually-rich document understanding

Y Xu, Y Xu, T Lv, L Cui, F Wei, G Wang, Y Lu… - arxiv preprint arxiv …, 2020 - arxiv.org

Pre-training of text and layout has proved effective in a variety of visually-rich document
understanding tasks due to its effective model architecture and the advantage of large-scale …

Сачувај Цитирај 571 пута наведен Сродни чланци Све верзије (7) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Icdar2019 competition on scanned receipt ocr and information extraction

[HTML][HTML] Summary of chatgpt-related research and perspective towards the future of large language models

Tools, techniques, datasets and application areas for object detection in an image: a review

Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

Bliva: A simple multimodal llm for better handling of text-rich visual questions

Ocr-free document understanding transformer

Layoutllm: Layout instruction tuning with large language models for document understanding

Textmonkey: An ocr-free large multimodal model for understanding document

Mmt-bench: A comprehensive multimodal benchmark for evaluating large vision-language models towards multitask agi

Docformer: End-to-end transformer for document understanding

Layoutlmv2: Multi-modal pre-training for visually-rich document understanding