Estextspotter: Towards better scene text spotting with explicit synergy in transformer

M Huang, J Zhang, D Peng, H Lu… - Proceedings of the …, 2023 - openaccess.thecvf.com
In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-
based framework. While previous studies have shown the crucial importance of the intrinsic …

Empowering agrifood system with artificial intelligence: A survey of the progress, challenges and opportunities

T Chen, L Lv, D Wang, J Zhang, Y Yang, Z Zhao… - ACM Computing …, 2024 - dl.acm.org
With the world population rapidly increasing, transforming our agrifood systems to be more
productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages …

Exploring ocr capabilities of gpt-4v (ision): A quantitative and in-depth evaluation

Y Shi, D Peng, W Liao, Z Lin, X Chen, C Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper presents a comprehensive evaluation of the Optical Character Recognition
(OCR) capabilities of the recently released GPT-4V (ision), a Large Multimodal Model …

Parrot captions teach clip to spot text

Y Lin, C He, AJ Wang, B Wang, W Li… - European Conference on …, 2024 - Springer
Despite CLIP being the foundation model in numerous vision-language applications, CLIP
suffers from a severe text spotting bias. Such bias causes CLIP models to 'Parrot'the visual …

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

J Wan, S Song, W Yu, Y Liu, W Cheng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recently visually-situated text parsing (VsTP) has experienced notable advancements
driven by the increasing demand for automated document understanding and the …

Turning a clip model into a scene text spotter

W Yu, Y Liu, X Zhu, H Cao, X Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
We exploit the potential of the large-scale Contrastive Language-Image Pretraining (CLIP)
model to enhance scene text detection and spotting tasks, transforming it into a robust …

DNTextSpotter: Arbitrary-shaped scene text spotting via improved denoising training

Q Qiao, Y **e, J Gao, T Wu, S Huang, J Fan… - Proceedings of the …, 2024 - dl.acm.org
More and more end-to-end text spotting methods based on Transformer architecture have
demonstrated superior performance. These methods utilize a bipartite graph matching …

Platypus: A generalized specialist model for reading text in various forms

P Wang, Z Li, J Tang, H Zhong, F Huang… - … on Computer Vision, 2024 - Springer
Reading text from images (either natural scenes or documents) has been a long-standing
research topic for decades, due to the high technical challenge and wide application range …

Hyper-local deformable transformers for text spotting on historical maps

Y Lin, YY Chiang - Proceedings of the 30th ACM SIGKDD Conference …, 2024 - dl.acm.org
Text on historical maps contains valuable information providing georeferenced historical,
political, and cultural contexts. However, text extraction from historical maps has been …

Bridging the Gap Between End-to-End and Two-Step Text Spotting

M Huang, H Li, Y Liu, X Bai… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Modularity plays a crucial role in the development and maintenance of complex systems.
While end-to-end text spotting efficiently mitigates the issues of error accumulation and sub …