A survey of OCR in Arabic language: applications, techniques, and challenges
Optical character recognition (OCR) is the process of extracting handwritten or printed text
from a scanned or printed image and converting it to a machine-readable form for further …
from a scanned or printed image and converting it to a machine-readable form for further …
Causal reasoning meets visual representation learning: A prospective study
Visual representation learning is ubiquitous in various real-world applications, including
visual comprehension, video understanding, multi-modal analysis, human-computer …
visual comprehension, video understanding, multi-modal analysis, human-computer …
Scene text recognition with permuted autoregressive sequence models
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
Deepsolo: Let transformer decoder with explicit points solo for text spotting
End-to-end text spotting aims to integrate scene text detection and recognition into a unified
framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in …
framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in …
Vision transformer for fast and efficient scene text recognition
R Atienza - International conference on document analysis and …, 2021 - Springer
Scene text recognition (STR) enables computers to read text in natural scenes such as
object labels, road signs and instructions. STR helps machines perform informed decisions …
object labels, road signs and instructions. STR helps machines perform informed decisions …
Multi-granularity prediction for scene text recognition
Scene text recognition (STR) has been an active research topic in computer vision for years.
To tackle this challenging problem, numerous innovative methods have been successively …
To tackle this challenging problem, numerous innovative methods have been successively …
Sequence-to-sequence contrastive learning for text recognition
We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual
representations, which we apply to text recognition. To account for the sequence-to …
representations, which we apply to text recognition. To account for the sequence-to …
Conditional text image generation with diffusion models
Y Zhu, Z Li, T Wang, M He… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Current text recognition systems, including those for handwritten scripts and scene text, have
relied heavily on image synthesis and augmentation, since it is difficult to realize real-world …
relied heavily on image synthesis and augmentation, since it is difficult to realize real-world …
Towards weakly-supervised text spotting using a multi-task transformer
Text spotting end-to-end methods have recently gained attention in the literature due to the
benefits of jointly optimizing the text detection and recognition components. Existing …
benefits of jointly optimizing the text detection and recognition components. Existing …
CLIP4STR: a simple baseline for scene text recognition with pre-trained vision-language model
Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …
downstream tasks. However, scene text recognition methods still prefer backbones pre …