Scene text detection and recognition: The deep learning era
With the rise and development of deep learning, computer vision has been tremendously
transformed and reshaped. As an important research area in computer vision, scene text …
transformed and reshaped. As an important research area in computer vision, scene text …
Text recognition in the wild: A survey
The history of text can be traced back over thousands of years. Rich and precise semantic
information carried by text is important in a wide range of vision-based application …
information carried by text is important in a wide range of vision-based application …
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
Deepseek-vl: towards real-world vision-language understanding
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …
world vision and language understanding applications. Our approach is structured around …
Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …
model linguistic rules in end-to-end deep networks remains a research challenge. In this …
Gliding vertex on the horizontal bounding box for multi-oriented object detection
Object detection has recently experienced substantial progress. Yet, the widely adopted
horizontal bounding box representation is not appropriate for ubiquitous oriented objects …
horizontal bounding box representation is not appropriate for ubiquitous oriented objects …
Scene text recognition with permuted autoregressive sequence models
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
Towards accurate scene text recognition with semantic reasoning networks
Scene text image contains two levels of contents: visual texture and semantic information.
Although the previous scene text recognition methods have made great progress over the …
Although the previous scene text recognition methods have made great progress over the …
Rotation-sensitive regression for oriented scene text detection
Text in natural images is of arbitrary orientations, requiring detection in terms of oriented
bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text …
bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text …
Towards end-to-end unified scene text detection and layout analysis
Scene text detection and document layout analysis have long been treated as two separate
tasks in different image domains. In this paper, we bring them together and introduce the …
tasks in different image domains. In this paper, we bring them together and introduce the …