Text recognition in the wild: A survey
The history of text can be traced back over thousands of years. Rich and precise semantic
information carried by text is important in a wide range of vision-based application …
information carried by text is important in a wide range of vision-based application …
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
Deepseek-vl: towards real-world vision-language understanding
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …
world vision and language understanding applications. Our approach is structured around …
Scene text recognition with permuted autoregressive sequence models
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
Boxinst: High-performance instance segmentation with box annotations
We present a high-performance method that can achieve mask-level instance segmentation
with only bounding-box annotations for training. While this setting has been studied in the …
with only bounding-box annotations for training. While this setting has been studied in the …
Swintextspotter: Scene text spotting via better synergy between text detection and text recognition
End-to-end scene text spotting has attracted great attention in recent years due to the
success of excavating the intrinsic synergy of the scene text detection and recognition …
success of excavating the intrinsic synergy of the scene text detection and recognition …
On the hidden mystery of ocr in large multimodal models
Large models have recently played a dominant role in natural language processing and
multimodal vision-language learning. However, their effectiveness in text-related visual …
multimodal vision-language learning. However, their effectiveness in text-related visual …
Estextspotter: Towards better scene text spotting with explicit synergy in transformer
In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-
based framework. While previous studies have shown the crucial importance of the intrinsic …
based framework. While previous studies have shown the crucial importance of the intrinsic …
Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting
End-to-end text-spotting, which aims to integrate detection and recognition in a unified
framework, has attracted increasing attention due to its simplicity of the two complimentary …
framework, has attracted increasing attention due to its simplicity of the two complimentary …
RTFN: A robust temporal feature network for time series classification
Time series data usually contains local and global patterns. Most of the existing feature
networks focus on local features rather than the relationships among them. The latter is also …
networks focus on local features rather than the relationships among them. The latter is also …