Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting

S Fang, Z Mao, H **e, Y Wang, C Yan… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Scene text spotting is of great importance to the computer vision community due to its wide
variety of applications. Recent methods attempt to introduce linguistic knowledge for …

Continuous human action recognition for human-machine interaction: a review

H Gammulle, D Ahmedt-Aristizabal, S Denman… - ACM Computing …, 2023 - dl.acm.org
With advances in data-driven machine learning research, a wide variety of prediction
models have been proposed to capture spatio-temporal features for the analysis of video …

Reading and writing: Discriminative and generative modeling for self-supervised text recognition

M Yang, M Liao, P Lu, J Wang, S Zhu, H Luo… - Proceedings of the 30th …, 2022 - dl.acm.org
Existing text recognition methods usually need large-scale training data. Most of them rely
on synthetic training data due to the lack of annotated real images. However, there is a …

[HTML][HTML] Human–AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM Era

R Yu, S Lee, J **e, SM Billah, JM Carroll - Future Internet, 2024 - mdpi.com
Remote sighted assistance (RSA) has emerged as a conversational technology aiding
people with visual impairments (VI) through real-time video chat communication with sighted …

Sketch2Saliency: learning to detect salient objects from human drawings

AK Bhunia, S Koley, A Kumar, A Sain… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human sketch has already proved its worth in various visual understanding tasks (eg,
retrieval, segmentation, image-captioning, etc). In this paper, we reveal a new trait of …

Multi-modal text recognition networks: Interactive enhancements between visual and semantic features

B Na, Y Kim, S Park - European Conference on Computer Vision, 2022 - Springer
Linguistic knowledge has brought great benefits to scene text recognition by providing
semantics to refine character sequences. However, since linguistic knowledge has been …

Cdistnet: Perceiving multi-domain character distance for robust text recognition

T Zheng, Z Chen, S Fang, H **e, YG Jiang - International Journal of …, 2024 - Springer
The transformer-based encoder-decoder framework is becoming popular in scene text
recognition, largely because it naturally integrates recognition clues from both visual and …

Dtrocr: Decoder-only transformer for optical character recognition

M Fujitake - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Typical text recognition methods rely on an encoder-decoder structure, in which the encoder
extracts features from an image, and the decoder produces recognized text from these …