Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models
Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …
Trocr: Transformer-based optical character recognition with pre-trained models
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …
approaches are usually built based on CNN for image understanding and RNN for char …
Scene text recognition with permuted autoregressive sequence models
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
Svtr: Scene text recognition with a single visual model
Dominant scene text recognition models commonly contain two building blocks, a visual
model for feature extraction and a sequence model for text transcription. This hybrid …
model for feature extraction and a sequence model for text transcription. This hybrid …
On the hidden mystery of ocr in large multimodal models
Large models have recently played a dominant role in natural language processing and
multimodal vision-language learning. However, their effectiveness in text-related visual …
multimodal vision-language learning. However, their effectiveness in text-related visual …
Swintextspotter: Scene text spotting via better synergy between text detection and text recognition
End-to-end scene text spotting has attracted great attention in recent years due to the
success of excavating the intrinsic synergy of the scene text detection and recognition …
success of excavating the intrinsic synergy of the scene text detection and recognition …
Estextspotter: Towards better scene text spotting with explicit synergy in transformer
In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-
based framework. While previous studies have shown the crucial importance of the intrinsic …
based framework. While previous studies have shown the crucial importance of the intrinsic …
Revisiting scene text recognition: A data perspective
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective.
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …
Multi-granularity prediction for scene text recognition
Scene text recognition (STR) has been an active research topic in computer vision for years.
To tackle this challenging problem, numerous innovative methods have been successively …
To tackle this challenging problem, numerous innovative methods have been successively …
Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting
Scene text spotting is of great importance to the computer vision community due to its wide
variety of applications. Recent methods attempt to introduce linguistic knowledge for …
variety of applications. Recent methods attempt to introduce linguistic knowledge for …