Trocr: Transformer-based optical character recognition with pre-trained models
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …
approaches are usually built based on CNN for image understanding and RNN for char …
Scene text recognition with permuted autoregressive sequence models
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting
Scene text spotting is of great importance to the computer vision community due to its wide
variety of applications. Recent methods attempt to introduce linguistic knowledge for …
variety of applications. Recent methods attempt to introduce linguistic knowledge for …
Continuous human action recognition for human-machine interaction: a review
With advances in data-driven machine learning research, a wide variety of prediction
models have been proposed to capture spatio-temporal features for the analysis of video …
models have been proposed to capture spatio-temporal features for the analysis of video …
Reading and writing: Discriminative and generative modeling for self-supervised text recognition
Existing text recognition methods usually need large-scale training data. Most of them rely
on synthetic training data due to the lack of annotated real images. However, there is a …
on synthetic training data due to the lack of annotated real images. However, there is a …
[HTML][HTML] Human–AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM Era
Remote sighted assistance (RSA) has emerged as a conversational technology aiding
people with visual impairments (VI) through real-time video chat communication with sighted …
people with visual impairments (VI) through real-time video chat communication with sighted …
Sketch2Saliency: learning to detect salient objects from human drawings
Human sketch has already proved its worth in various visual understanding tasks (eg,
retrieval, segmentation, image-captioning, etc). In this paper, we reveal a new trait of …
retrieval, segmentation, image-captioning, etc). In this paper, we reveal a new trait of …
Multi-modal text recognition networks: Interactive enhancements between visual and semantic features
Linguistic knowledge has brought great benefits to scene text recognition by providing
semantics to refine character sequences. However, since linguistic knowledge has been …
semantics to refine character sequences. However, since linguistic knowledge has been …
Cdistnet: Perceiving multi-domain character distance for robust text recognition
The transformer-based encoder-decoder framework is becoming popular in scene text
recognition, largely because it naturally integrates recognition clues from both visual and …
recognition, largely because it naturally integrates recognition clues from both visual and …
Dtrocr: Decoder-only transformer for optical character recognition
M Fujitake - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Typical text recognition methods rely on an encoder-decoder structure, in which the encoder
extracts features from an image, and the decoder produces recognized text from these …
extracts features from an image, and the decoder produces recognized text from these …