- Academic Search

W Yu, Y Liu, W Hua, D Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent large-scale Contrastive Language-Image Pretraining (CLIP) model has shown
great potential in various downstream tasks via leveraging the pretrained vision and …

Uložit Citovat Počet citací tohoto článku: 72 Související články Všechny verze (počet: 7) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Towards robust real-time scene text detection: From semantic to instance representation learning

X Qin, P Lyu, C Zhang, Y Zhou, K Yao… - Proceedings of the 31st …, 2023 - dl.acm.org

Due to the flexible representation of arbitrary-shaped scene text and simple pipeline, bottom-
up segmentation-based methods begin to be mainstream in real-time scene text detection …

Uložit Citovat Počet citací tohoto článku: 15 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Turning a clip model into a scene text spotter

W Yu, Y Liu, X Zhu, H Cao, X Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

We exploit the potential of the large-scale Contrastive Language-Image Pretraining (CLIP)
model to enhance scene text detection and spotting tasks, transforming it into a robust …

Uložit Citovat Počet citací tohoto článku: 9 Související články Všechny verze (počet: 6)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Clipter: Looking at the bigger picture in scene text recognition

A Aberdam, D Bensaïd, A Golts… - Proceedings of the …, 2023 - openaccess.thecvf.com

Reading text in real-world scenarios often requires understanding the context surrounding it,
especially when dealing with poor-quality text. However, current scene text recognizers are …

Uložit Citovat Počet citací tohoto článku: 21 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Perceiving ambiguity and semantics without recognition: an efficient and effective ambiguous scene text detector

Y Shu, W Wang, Y Zhou, S Liu, A Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org

Ambiguous scene text detection is an extremely challenging task. Existing text detectors that
rely solely on visual cues often suffer from confusion due to being evenly distributed in …

Uložit Citovat Počet citací tohoto článku: 9 Související články

Leveraging Contrastive Language–Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting

D Liu, Q Mao, L Gao, G Wang - Engineering Applications of Artificial …, 2024 - Elsevier

In resource-limited keyword spotting scenarios, the scarcity of annotated corpora hinders
deep learning's ability to develop robust models for representing acoustic features. Recent …

Uložit Citovat Počet citací tohoto článku: 2 Související články

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Less is more: Removing text-regions improves clip training efficiency and robustness

L Cao, B Zhang, C Chen, Y Yang, X Du… - arxiv preprint arxiv …, 2023 - arxiv.org

The CLIP (Contrastive Language-Image Pre-training) model and its variants are becoming
the de facto backbone in many applications. However, training a CLIP model from hundreds …

Uložit Citovat Počet citací tohoto článku: 21 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

End-to-end semi-supervised approach with modulated object queries for table detection in documents

I Ehsan, T Shehzadi, D Stricker, MZ Afzal - International Journal on …, 2024 - Springer

Table detection, a pivotal task in document analysis, aims to precisely recognize and locate
tables within document images. Although deep learning has shown remarkable progress in …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 3)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap

D Kim, Y Kim, DH Kim, Y Lim… - Proceedings of the …, 2023 - openaccess.thecvf.com

Inspired by the great success of language model (LM)-based pre-training, recent studies in
visual document understanding have explored LM-based pre-training methods for modeling …

Uložit Citovat Počet citací tohoto článku: 2 Související články Všechny verze (počet: 5) Zobrazit jako HTML

Hierarchical visual-semantic interaction for scene text recognition

L Diao, X Tang, J Wang, G **e, J Hu - Information Fusion, 2024 - Elsevier

Proper interaction between visual and semantic features is crucial to obtain a powerful
feature representation for scene text recognition (STR). The existing interaction methods …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 2)

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Turning a clip model into a scene text detector

Towards robust real-time scene text detection: From semantic to instance representation learning

Turning a clip model into a scene text spotter

Clipter: Looking at the bigger picture in scene text recognition

Perceiving ambiguity and semantics without recognition: an efficient and effective ambiguous scene text detector

Leveraging Contrastive Language–Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting

Less is more: Removing text-regions improves clip training efficiency and robustness

End-to-end semi-supervised approach with modulated object queries for table detection in documents

SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap

Hierarchical visual-semantic interaction for scene text recognition