الباحث العلمي من Google

W Yu, Y Liu, W Hua, D Jiang… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

The recent large-scale Contrastive Language-Image Pretraining (CLIP) model has shown
great potential in various downstream tasks via leveraging the pretrained vision and …‏

حفظ اقتباس تم اقتباسها في عدد: 72 مقالات ذات صلة الإصدارات الـ 9كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Omniparser: A unified framework for text spotting key information extraction and table recognition‏

J Wan, S Song, W Yu, Y Liu, W Cheng… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

Recently visually-situated text parsing (VsTP) has experienced notable advancements
driven by the increasing demand for automated document understanding and the …‏

حفظ اقتباس تم اقتباسها في عدد: 13 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Odm: A text-image further alignment pre-training approach for scene text detection and spotting‏

C Duan, P Fu, S Guo, Q Jiang… - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com‏

In recent years text-image joint pre-training techniques have shown promising results in
various tasks. However in Optical Character Recognition (OCR) tasks aligning text instances …‏

حفظ اقتباس تم اقتباسها في عدد: 6 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Maskocr: Text recognition with masked encoder-decoder pretraining‏

P Lyu, C Zhang, S Liu, M Qiao, Y Xu, L Wu… - arxiv preprint arxiv …, 2022‏ - arxiv.org‏

Text images contain both visual and linguistic information. However, existing pre-training
techniques for text recognition mainly focus on either visual representation learning or …‏

حفظ اقتباس تم اقتباسها في عدد: 47 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Towards robust real-time scene text detection: From semantic to instance representation learning‏

X Qin, P Lyu, C Zhang, Y Zhou, K Yao… - Proceedings of the 31st …, 2023‏ - dl.acm.org‏

Due to the flexible representation of arbitrary-shaped scene text and simple pipeline, bottom-
up segmentation-based methods begin to be mainstream in real-time scene text detection …‏

حفظ اقتباس تم اقتباسها في عدد: 16 مقالات ذات صلة الإصدارات الـ 4كلها

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Modeling entities as semantic points for visual information extraction in the wild‏

Z Yang, R Long, P Wang, S Song… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Abstract Recently, Visual Information Extraction (VIE) has been becoming increasingly
important in both academia and industry, due to the wide range of real-world applications …‏

حفظ اقتباس تم اقتباسها في عدد: 13 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Less is more: Removing text-regions improves clip training efficiency and robustness‏

L Cao, B Zhang, C Chen, Y Yang, X Du… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

The CLIP (Contrastive Language-Image Pre-training) model and its variants are becoming
the de facto backbone in many applications. However, training a CLIP model from hundreds …‏

حفظ اقتباس تم اقتباسها في عدد: 21 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction‏

Q Zhang, VSJ Huang, B Wang, J Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Document parsing is essential for converting unstructured and semi-structured documents-
such as contracts, academic papers, and invoices-into structured, machine-readable data …‏

حفظ اقتباس تم اقتباسها في عدد: 1 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Turning a clip model into a scene text spotter‏

W Yu, Y Liu, X Zhu, H Cao, X Sun… - IEEE Transactions on …, 2024‏ - ieeexplore.ieee.org‏

We exploit the potential of the large-scale Contrastive Language-Image Pretraining (CLIP)
model to enhance scene text detection and spotting tasks, transforming it into a robust …‏

حفظ اقتباس تم اقتباسها في عدد: 9 مقالات ذات صلة الإصدارات الـ 10كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zero-shot object counting with good exemplars‏

H Zhu, J Yuan, Z Yang, Y Guo, Z Wang… - … on Computer Vision, 2024‏ - Springer‏

Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names
of object classes during testing, without the need for manual annotations. However, a critical …‏

حفظ اقتباس تم اقتباسها في عدد: 4 مقالات ذات صلة الإصدارات الـ 8كلها

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Vision-language pre-training for boosting scene text detectors

Turning a clip model into a scene text detector‏

Omniparser: A unified framework for text spotting key information extraction and table recognition‏

Odm: A text-image further alignment pre-training approach for scene text detection and spotting‏

Maskocr: Text recognition with masked encoder-decoder pretraining‏

Towards robust real-time scene text detection: From semantic to instance representation learning‏

Modeling entities as semantic points for visual information extraction in the wild‏

Less is more: Removing text-regions improves clip training efficiency and robustness‏

Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction‏

Turning a clip model into a scene text spotter‏

Zero-shot object counting with good exemplars‏