- Academic Search

Text recognition in the wild: A survey

X Chen, L **, Y Zhu, C Luo, T Wang - ACM Computing Surveys (CSUR), 2021 - dl.acm.org

The history of text can be traced back over thousands of years. Rich and precise semantic
information carried by text is important in a wide range of vision-based application …

Save Cite Cited by 256 Related articles All 5 versions Free GPT-4

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Save Cite Cited by 348 Related articles All 2 versions Free GPT-4

Save Cite Cited by 191 Related articles All 4 versions Free GPT-4 View as HTML

Deepseek-vl: towards real-world vision-language understanding

H Lu, W Liu, B Zhang, B Wang, K Dong, B Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …

Save Cite Cited by 420 Related articles All 6 versions Free GPT-4 View as HTML

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H **e, Y Wang, Z Mao… - Proceedings of the …, 2021 - openaccess.thecvf.com

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …

[PDF] ieee.org

Gliding vertex on the horizontal bounding box for multi-oriented object detection

Y Xu, M Fu, Q Wang, Y Wang, K Chen… - IEEE transactions on …, 2020 - ieeexplore.ieee.org

Object detection has recently experienced substantial progress. Yet, the widely adopted
horizontal bounding box representation is not appropriate for ubiquitous oriented objects …

Save Cite Cited by 786 Related articles All 10 versions Free GPT-4

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer

Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Save Cite Cited by 198 Related articles All 6 versions Free GPT-4

Save Cite Cited by 405 Related articles All 6 versions Free GPT-4 View as HTML

Towards accurate scene text recognition with semantic reasoning networks

D Yu, X Li, C Zhang, T Liu, J Han… - Proceedings of the …, 2020 - openaccess.thecvf.com

Scene text image contains two levels of contents: visual texture and semantic information.
Although the previous scene text recognition methods have made great progress over the …

Save Cite Cited by 590 Related articles All 8 versions Free GPT-4 View as HTML

Rotation-sensitive regression for oriented scene text detection

M Liao, Z Zhu, B Shi, G **a… - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com

Text in natural images is of arbitrary orientations, requiring detection in terms of oriented
bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text …