Git: A generative image-to-text transformer for vision and language

J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu… - arxiv preprint arxiv …, 2022 - arxiv.org
In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify
vision-language tasks such as image/video captioning and question answering. While …

Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H **e, Y Wang, Z Mao… - Proceedings of the …, 2021 - openaccess.thecvf.com
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

From two to one: A new scene text recognizer with visual language modeling network

Y Wang, H **e, S Fang, J Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com
In this paper, we abandon the dominant complex language model and rethink the linguistic
learning process in the scene text recognition. Different from previous methods considering …

Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting

Y Liu, C Shen, L **, T He, P Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
End-to-end text-spotting, which aims to integrate detection and recognition in a unified
framework, has attracted increasing attention due to its simplicity of the two complimentary …

Revisiting scene text recognition: A data perspective

Q Jiang, J Wang, D Peng, C Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective.
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …

Vision transformer with progressive sampling

X Yue, S Sun, Z Kuang, M Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com
Transformers with powerful global relation modeling abilities have been introduced to
fundamental computer vision tasks recently. As a typical example, the Vision Transformer …

Multi-granularity prediction for scene text recognition

P Wang, C Da, C Yao - European Conference on Computer Vision, 2022 - Springer
Scene text recognition (STR) has been an active research topic in computer vision for years.
To tackle this challenging problem, numerous innovative methods have been successively …

Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting

S Fang, Z Mao, H **e, Y Wang, C Yan… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Scene text spotting is of great importance to the computer vision community due to its wide
variety of applications. Recent methods attempt to introduce linguistic knowledge for …