Vision transformer architecture and applications in digital health: a tutorial and survey

K Al-Hammuri, F Gebali, A Kanan… - Visual computing for …, 2023 - Springer
The vision transformer (ViT) is a state-of-the-art architecture for image recognition tasks that
plays an important role in digital health applications. Medical images account for 90% of the …

CLIP4STR: A simple baseline for scene text recognition with pre-trained vision-language model

S Zhao, R Quan, L Zhu, Y Yang - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …

Cdistnet: Perceiving multi-domain character distance for robust text recognition

T Zheng, Z Chen, S Fang, H **e, YG Jiang - International Journal of …, 2024 - Springer
The transformer-based encoder-decoder framework is becoming popular in scene text
recognition, largely because it naturally integrates recognition clues from both visual and …

Hiercode: A lightweight hierarchical codebook for zero-shot chinese text recognition

Y Zhang, Y Zhu, D Peng, P Zhang, Z Yang, Z Yang… - Pattern Recognition, 2025 - Elsevier
Text recognition, especially for complex scripts like Chinese, faces unique challenges due to
its intricate character structures and vast vocabulary. Traditional one-hot encoding methods …

Symmetrical linguistic feature distillation with clip for scene text recognition

Z Wang, H **e, Y Wang, J Xu, B Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org
In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP)
model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature …

Linguistic more: Taking a further step toward efficient and accurate scene text recognition

B Zhang, H **e, Y Wang, J Xu, Y Zhang - arxiv preprint arxiv:2305.05140, 2023 - arxiv.org
Vision model have gained increasing attention due to their simplicity and efficiency in Scene
Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge …

A novel daily runoff forecasting model based on global features and enhanced local feature interpretation

D Xu, Y Hong, W Wang, Z Li, J Wang - Journal of Hydrology, 2024 - Elsevier
The development of artificial intelligence has introduced new perspectives to the field of
hydrological forecasting. However, there is still a lack of research on efficiently identifying …

Tps++: Attention-enhanced thin-plate spline for scene text recognition

T Zheng, Z Chen, J Bai, H **e, YG Jiang - arxiv preprint arxiv:2305.05322, 2023 - arxiv.org
Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline
(TPS)-based rectification is widely regarded as an effective means to deal with them …

Mrn: Multiplexed routing network for incremental multilingual text recognition

T Zheng, Z Chen, B Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multilingual text recognition (MLTR) systems typically focus on a fixed set of languages,
which makes it difficult to handle newly added languages or adapt to ever-changing data …

Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection

K Wang, H **e, Y Wang, D Zhang, Y Qu, Z Gao… - Proceedings of the 31st …, 2023 - dl.acm.org
Scene text detection has made great progress recently with the wide use of pre-training.
Nonetheless, existing scene text detection methods still suffer from two problems: 1) Limited …