- Academic Search

J Kaur, W Singh - Multimedia Tools and Applications, 2022 - Springer

Object detection is one of the most fundamental and challenging tasks to locate objects in
images and videos. Over the past, it has gained much attention to do more research on …

保存引用被引用数: 142 関連記事全 8 バージョン

[Free GPT-4]

[PDF] researchgate.net

Scene text detection and recognition: The deep learning era

S Long, X He, C Yao - International Journal of Computer Vision, 2021 - Springer

With the rise and development of deep learning, computer vision has been tremendously
transformed and reshaped. As an important research area in computer vision, scene text …

保存引用被引用数: 536 関連記事全 8 バージョン

[Free GPT-4]

[PDF] ieee.org

Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

P Xu, W Shao, K Zhang, P Gao, S Liu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …

保存引用被引用数: 184 関連記事全 3 バージョン

[Free GPT-4]

[PDF] arxiv.org

Git: A generative image-to-text transformer for vision and language

J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify
vision-language tasks such as image/video captioning and question answering. While …

保存引用被引用数: 570 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] aaai.org

Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org

Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

保存引用被引用数: 445 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer

Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

保存引用被引用数: 198 関連記事全 6 バージョン

[Free GPT-4]

[PDF] thecvf.com

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H **e, Y Wang, Z Mao… - Proceedings of the …, 2021 - openaccess.thecvf.com

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …

保存引用被引用数: 420 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Svtr: Scene text recognition with a single visual model

Y Du, Z Chen, C Jia, X Yin, T Zheng, C Li, Y Du… - arxiv preprint arxiv …, 2022 - arxiv.org

Dominant scene text recognition models commonly contain two building blocks, a visual
model for feature extraction and a sequence model for text transcription. This hybrid …

保存引用被引用数: 205 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

From two to one: A new scene text recognizer with visual language modeling network

Y Wang, H **e, S Fang, J Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com

In this paper, we abandon the dominant complex language model and rethink the linguistic
learning process in the scene text recognition. Different from previous methods considering …

保存引用被引用数: 184 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

On the hidden mystery of ocr in large multimodal models

Y Liu, Z Li, B Yang, C Li, X Yin, C Liu, L **… - arxiv preprint arxiv …, 2023 - arxiv.org

Large models have recently played a dominant role in natural language processing and
multimodal vision-language learning. However, their effectiveness in text-related visual …

保存引用被引用数: 174 関連記事全 2 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

A robust arbitrary text detection system for natural scene images

Tools, techniques, datasets and application areas for object detection in an image: a review

Scene text detection and recognition: The deep learning era

Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

Git: A generative image-to-text transformer for vision and language

Trocr: Transformer-based optical character recognition with pre-trained models

Scene text recognition with permuted autoregressive sequence models

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

Svtr: Scene text recognition with a single visual model

From two to one: A new scene text recognizer with visual language modeling network

On the hidden mystery of ocr in large multimodal models