Vision transformer architecture and applications in digital health: a tutorial and survey
The vision transformer (ViT) is a state-of-the-art architecture for image recognition tasks that
plays an important role in digital health applications. Medical images account for 90% of the …
plays an important role in digital health applications. Medical images account for 90% of the …
CLIP4STR: A simple baseline for scene text recognition with pre-trained vision-language model
Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …
downstream tasks. However, scene text recognition methods still prefer backbones pre …
Cdistnet: Perceiving multi-domain character distance for robust text recognition
The transformer-based encoder-decoder framework is becoming popular in scene text
recognition, largely because it naturally integrates recognition clues from both visual and …
recognition, largely because it naturally integrates recognition clues from both visual and …
Hiercode: A lightweight hierarchical codebook for zero-shot chinese text recognition
Text recognition, especially for complex scripts like Chinese, faces unique challenges due to
its intricate character structures and vast vocabulary. Traditional one-hot encoding methods …
its intricate character structures and vast vocabulary. Traditional one-hot encoding methods …
Symmetrical linguistic feature distillation with clip for scene text recognition
In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP)
model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature …
model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature …
Linguistic more: Taking a further step toward efficient and accurate scene text recognition
Vision model have gained increasing attention due to their simplicity and efficiency in Scene
Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge …
Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge …
A novel daily runoff forecasting model based on global features and enhanced local feature interpretation
D Xu, Y Hong, W Wang, Z Li, J Wang - Journal of Hydrology, 2024 - Elsevier
The development of artificial intelligence has introduced new perspectives to the field of
hydrological forecasting. However, there is still a lack of research on efficiently identifying …
hydrological forecasting. However, there is still a lack of research on efficiently identifying …
Tps++: Attention-enhanced thin-plate spline for scene text recognition
Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline
(TPS)-based rectification is widely regarded as an effective means to deal with them …
(TPS)-based rectification is widely regarded as an effective means to deal with them …
Mrn: Multiplexed routing network for incremental multilingual text recognition
Multilingual text recognition (MLTR) systems typically focus on a fixed set of languages,
which makes it difficult to handle newly added languages or adapt to ever-changing data …
which makes it difficult to handle newly added languages or adapt to ever-changing data …
Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection
Scene text detection has made great progress recently with the wide use of pre-training.
Nonetheless, existing scene text detection methods still suffer from two problems: 1) Limited …
Nonetheless, existing scene text detection methods still suffer from two problems: 1) Limited …