Glass: Global to local attention for scene-text spotting

R Ronen, S Tsiper, O Anschel, I Lavi… - … on Computer Vision, 2022 - Springer
In recent years, the dominant paradigm for text spotting is to combine the tasks of text
detection and recognition into a single end-to-end framework. Under this paradigm, both …

Exploring stroke-level modifications for scene text editing

Y Qu, Q Tan, H **e, J Xu, Y Wang… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Scene text editing (STE) aims to replace text with the desired one while preserving
background and styles of the original text. However, due to the complicated background …

Multi-view correlation distillation for incremental object detection

D Yang, Y Zhou, A Zhang, X Sun, D Wu, W Wang… - Pattern Recognition, 2022 - Elsevier
In real applications, new object classes often emerge after the detection model has been
trained on a prepared dataset with fixed classes. Fine-tuning the old model with only new …

Towards robust real-time scene text detection: From semantic to instance representation learning

X Qin, P Lyu, C Zhang, Y Zhou, K Yao… - Proceedings of the 31st …, 2023 - dl.acm.org
Due to the flexible representation of arbitrary-shaped scene text and simple pipeline, bottom-
up segmentation-based methods begin to be mainstream in real-time scene text detection …

Beyond OCR+ VQA: Towards end-to-end reading and reasoning for robust and accurate textvqa

G Zeng, Y Zhang, Y Zhou, X Yang, N Jiang, G Zhao… - Pattern Recognition, 2023 - Elsevier
Text-based visual question answering (TextVQA), which answers a visual question by
considering both visual contents and scene texts, has attracted increasing attention recently …

[PDF][PDF] Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables.

H Shen, X Gao, J Wei, L Qiao, Y Zhou, Q Li, Z Cheng - IJCAI, 2023 - researchgate.net
Abstract Recent advanced Table Structure Recognition (TSR) models adopt image-to-text
solutions to parse table structure. These methods can be formulated as image caption …

Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation

W Wang, Y Zhou, J Lv, D Wu, G Zhao, N Jiang… - Proceedings of the 30th …, 2022 - dl.acm.org
The research focus of scene text detection and recognition has shifted to arbitrary shape text
in recent years, where the text shape representation is a fundamental problem. An ideal …

Perceiving ambiguity and semantics without recognition: an efficient and effective ambiguous scene text detector

Y Shu, W Wang, Y Zhou, S Liu, A Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org
Ambiguous scene text detection is an extremely challenging task. Existing text detectors that
rely solely on visual cues often suffer from confusion due to being evenly distributed in …

First creating backgrounds then rendering texts: A new paradigm for visual text blending

Z Li, Y Shu, W Zeng, D Yang, Y Zhou - arxiv preprint arxiv:2410.10168, 2024 - arxiv.org
Diffusion models, known for their impressive image generation abilities, have played a
pivotal role in the rise of visual text generation. Nevertheless, existing visual text generation …

Filling in the blank: Rationale-augmented prompt tuning for TextVQA

G Zeng, Y Zhang, Y Zhou, B Fang, G Zhao… - Proceedings of the 31st …, 2023 - dl.acm.org
Recently, generative Text-based visual question answering (TextVQA) methods, which are
often based on language models, have exhibited impressive results and drawn increasing …