- Academic Search

H Lu, W Liu, B Zhang, B Wang, K Dong, B Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …

Save Cite Cited by 217 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H **e, Y Wang, Z Mao… - Proceedings of the …, 2021 - openaccess.thecvf.com

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …

Save Cite Cited by 424 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer

Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Save Cite Cited by 206 Related articles All 8 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Revisiting scene text recognition: A data perspective

Q Jiang, J Wang, D Peng, C Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective.
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …

Save Cite Cited by 47 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Dtrocr: Decoder-only transformer for optical character recognition

M Fujitake - Proceedings of the IEEE/CVF winter conference …, 2024 - openaccess.thecvf.com

Typical text recognition methods rely on an encoder-decoder structure, in which the encoder
extracts features from an image, and the decoder produces recognized text from these …

Save Cite Cited by 45 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Mllmguard: A multi-dimensional safety evaluation suite for multimodal large language models

T Gu, Z Zhou, K Huang, L Dandan… - Advances in …, 2025 - proceedings.neurips.cc

Powered by remarkable advancements in Large Language Models (LLMs), Multimodal
Large Language Models (MLLMs) demonstrate impressive capabilities in manifold tasks …

Save Cite Cited by 10 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting

S Fang, Z Mao, H **e, Y Wang, C Yan… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Scene text spotting is of great importance to the computer vision community due to its wide
variety of applications. Recent methods attempt to introduce linguistic knowledge for …

Save Cite Cited by 60 Related articles All 7 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

What if we only use real datasets for scene text recognition? toward scene text recognition with fewer labels

J Baek, Y Matsui, K Aizawa - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Scene text recognition (STR) task has a common practice: All state-of-the-art STR models
are trained on large synthetic data. In contrast to this practice, training STR models only on …

Save Cite Cited by 118 Related articles All 8 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

LISTER: neighbor decoding for length-insensitive scene text recognition

C Cheng, P Wang, C Da, Q Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

The diversity in length constitutes a significant characteristic of text. Due to the long-tail
distribution of text lengths, most existing methods for scene text recognition (STR) only work …

Save Cite Cited by 25 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CLIP4STR: a simple baseline for scene text recognition with pre-trained vision-language model

S Zhao, R Quan, L Zhu, Y Yang - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org

Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …

Save Cite Cited by 33 Related articles All 5 versions Free GPT-4 DeepSeek

Create alert

Cite

Advanced search

Saved to My library

Uber-text: A large-scale dataset for optical character recognition from street-level imagery

Deepseek-vl: towards real-world vision-language understanding

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

Scene text recognition with permuted autoregressive sequence models

Revisiting scene text recognition: A data perspective

Dtrocr: Decoder-only transformer for optical character recognition

Mllmguard: A multi-dimensional safety evaluation suite for multimodal large language models

Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting

What if we only use real datasets for scene text recognition? toward scene text recognition with fewer labels

LISTER: neighbor decoding for length-insensitive scene text recognition

CLIP4STR: a simple baseline for scene text recognition with pre-trained vision-language model