Google 학술 검색

[HTML][HTML] Summary of chatgpt-related research and perspective towards the future of large language models

Y Liu, T Han, S Ma, J Zhang, Y Yang, J Tian, H He, A Li… - Meta-Radiology, 2023 - Elsevier

This paper presents a comprehensive survey of ChatGPT-related (GPT-3.5 and GPT-4)
research, state-of-the-art large language models (LLM) from the GPT series, and their …

저장 인용 964회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] springer.com

Tools, techniques, datasets and application areas for object detection in an image: a review

J Kaur, W Singh - Multimedia Tools and Applications, 2022 - Springer

Object detection is one of the most fundamental and challenging tasks to locate objects in
images and videos. Over the past, it has gained much attention to do more research on …

저장 인용 144회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]

[PDF] ieee.org

Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

P Xu, W Shao, K Zhang, P Gao, S Liu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …

저장 인용 184회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] arxiv.org

Ocr-free document understanding transformer

G Kim, T Hong, M Yim, JY Nam, J Park, J Yim… - … on Computer Vision, 2022 - Springer

Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …

저장 인용 375회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]

[PDF] aaai.org

Bliva: A simple multimodal llm for better handling of text-rich visual questions

W Hu, Y Xu, Y Li, W Li, Z Chen, Z Tu - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Vision Language Models (VLMs), which extend Large Language Models (LLM) by
incorporating visual understanding capability, have demonstrated significant advancements …

저장 인용 129회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Layoutlmv2: Multi-modal pre-training for visually-rich document understanding

Y Xu, Y Xu, T Lv, L Cui, F Wei, G Wang, Y Lu… - arxiv preprint arxiv …, 2020 - arxiv.org

Pre-training of text and layout has proved effective in a variety of visually-rich document
understanding tasks due to its effective model architecture and the advantage of large-scale …

저장 인용 560회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Docformer: End-to-end transformer for document understanding

S Appalaraju, B Jasani, BU Kota… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present DocFormer-a multi-modal transformer based architecture for the task of Visual
Document Understanding (VDU). VDU is a challenging problem which aims to understand …

저장 인용 316회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

On the hidden mystery of ocr in large multimodal models

Y Liu, Z Li, B Yang, C Li, X Yin, C Liu, L **… - arxiv preprint arxiv …, 2023 - arxiv.org

Large models have recently played a dominant role in natural language processing and
multimodal vision-language learning. However, their effectiveness in text-related visual …

저장 인용 175회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Docpedia: Unleashing the power of large multimodal model in the frequency domain for versatile document understanding

H Feng, Q Liu, H Liu, J Tang, W Zhou, H Li… - Science China …, 2024 - Springer

In this work, we present DocPedia, a novel large multimodal model (LMM) for versatile OCR-
free document understanding, capable of parsing images up to 2560× 2560 resolution …

저장 인용 43회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]

[PDF] aaai.org

Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents

T Hong, D Kim, M Ji, W Hwang, D Nam… - Proceedings of the AAAI …, 2022 - ojs.aaai.org

Key information extraction (KIE) from document images requires understanding the
contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent …

저장 인용 179회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Icdar2019 competition on scanned receipt ocr and information extraction

[HTML][HTML] Summary of chatgpt-related research and perspective towards the future of large language models

Tools, techniques, datasets and application areas for object detection in an image: a review

Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

Ocr-free document understanding transformer

Bliva: A simple multimodal llm for better handling of text-rich visual questions

Layoutlmv2: Multi-modal pre-training for visually-rich document understanding

Docformer: End-to-end transformer for document understanding

On the hidden mystery of ocr in large multimodal models

Docpedia: Unleashing the power of large multimodal model in the frequency domain for versatile document understanding

Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents