- Academic Search

K Al-Hammuri, F Gebali, A Kanan… - Visual computing for …, 2023 - Springer

The vision transformer (ViT) is a state-of-the-art architecture for image recognition tasks that
plays an important role in digital health applications. Medical images account for 90% of the …

保存引用被引用次数：52 相关文章所有 10 个版本

[Free GPT-4]

[PDF] arxiv.org

CLIP4STR: A simple baseline for scene text recognition with pre-trained vision-language model

S Zhao, R Quan, L Zhu, Y Yang - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org

Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …

保存引用被引用次数：30 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

Cdistnet: Perceiving multi-domain character distance for robust text recognition

T Zheng, Z Chen, S Fang, H **e, YG Jiang - International Journal of …, 2024 - Springer

The transformer-based encoder-decoder framework is becoming popular in scene text
recognition, largely because it naturally integrates recognition clues from both visual and …

保存引用被引用次数：66 相关文章所有 4 个版本

[Free GPT-4]

[PDF] arxiv.org

Hiercode: A lightweight hierarchical codebook for zero-shot chinese text recognition

Y Zhang, Y Zhu, D Peng, P Zhang, Z Yang, Z Yang… - Pattern Recognition, 2025 - Elsevier

Text recognition, especially for complex scripts like Chinese, faces unique challenges due to
its intricate character structures and vast vocabulary. Traditional one-hot encoding methods …

保存引用被引用次数：5 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

Symmetrical linguistic feature distillation with clip for scene text recognition

Z Wang, H **e, Y Wang, J Xu, B Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org

In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP)
model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature …

保存引用被引用次数：22 相关文章所有 4 个版本

[Free GPT-4]

[PDF] arxiv.org

Linguistic more: Taking a further step toward efficient and accurate scene text recognition

B Zhang, H **e, Y Wang, J Xu, Y Zhang - arxiv preprint arxiv:2305.05140, 2023 - arxiv.org

Vision model have gained increasing attention due to their simplicity and efficiency in Scene
Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge …

保存引用被引用次数：28 相关文章所有 4 个版本 HTML 版

A novel daily runoff forecasting model based on global features and enhanced local feature interpretation

D Xu, Y Hong, W Wang, Z Li, J Wang - Journal of Hydrology, 2024 - Elsevier

The development of artificial intelligence has introduced new perspectives to the field of
hydrological forecasting. However, there is still a lack of research on efficiently identifying …

保存引用被引用次数：3 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

Tps++: Attention-enhanced thin-plate spline for scene text recognition

T Zheng, Z Chen, J Bai, H **e, YG Jiang - arxiv preprint arxiv:2305.05322, 2023 - arxiv.org

Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline
(TPS)-based rectification is widely regarded as an effective means to deal with them …

保存引用被引用次数：23 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Mrn: Multiplexed routing network for incremental multilingual text recognition

T Zheng, Z Chen, B Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Multilingual text recognition (MLTR) systems typically focus on a fixed set of languages,
which makes it difficult to handle newly added languages or adapt to ever-changing data …

保存引用被引用次数：11 相关文章所有 5 个版本 HTML 版

[Free GPT-4]

[PDF] archive.org

Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection

K Wang, H **e, Y Wang, D Zhang, Y Qu, Z Gao… - Proceedings of the 31st …, 2023 - dl.acm.org

Scene text detection has made great progress recently with the wide use of pre-training.
Nonetheless, existing scene text detection methods still suffer from two problems: 1) Limited …

保存引用被引用次数：6 相关文章所有 2 个版本

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Petr: Rethinking the capability of transformer-based language model in scene text recognition

Vision transformer architecture and applications in digital health: a tutorial and survey

CLIP4STR: A simple baseline for scene text recognition with pre-trained vision-language model

Cdistnet: Perceiving multi-domain character distance for robust text recognition

Hiercode: A lightweight hierarchical codebook for zero-shot chinese text recognition

Symmetrical linguistic feature distillation with clip for scene text recognition

Linguistic more: Taking a further step toward efficient and accurate scene text recognition

A novel daily runoff forecasting model based on global features and enhanced local feature interpretation

Tps++: Attention-enhanced thin-plate spline for scene text recognition

Mrn: Multiplexed routing network for incremental multilingual text recognition

Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection