Image inpainting: A review

O Elharrouss, N Almaadeed, S Al-Maadeed… - Neural Processing …, 2020 - Springer
Although image inpainting, or the art of repairing the old and deteriorated images, has been
around for many years, it has recently gained even more popularity, because of the recent …

Scene text detection and recognition: The deep learning era

S Long, X He, C Yao - International Journal of Computer Vision, 2021 - Springer
With the rise and development of deep learning, computer vision has been tremendously
transformed and reshaped. As an important research area in computer vision, scene text …

Seed-bench: Benchmarking multimodal large language models

B Li, Y Ge, Y Ge, G Wang, R Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) building upon the foundation of powerful large
language models (LLMs) have recently demonstrated exceptional capabilities in generating …

Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

P Xu, W Shao, K Zhang, P Gao, S Liu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …

Git: A generative image-to-text transformer for vision and language

J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu… - arxiv preprint arxiv …, 2022 - arxiv.org
In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify
vision-language tasks such as image/video captioning and question answering. While …

Adaptive rotated convolution for rotated object detection

Y Pu, Y Wang, Z **a, Y Han, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Rotated object detection aims to identify and locate objects in images with arbitrary
orientation. In this scenario, the oriented directions of objects vary considerably across …

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H **e, Y Wang, Z Mao… - Proceedings of the …, 2021 - openaccess.thecvf.com
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …

Conditional text image generation with diffusion models

Y Zhu, Z Li, T Wang, M He… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Current text recognition systems, including those for handwritten scripts and scene text, have
relied heavily on image synthesis and augmentation, since it is difficult to realize real-world …