Μελετητής Google

O Elharrouss, N Almaadeed, S Al-Maadeed… - Neural Processing …, 2020 - Springer

Although image inpainting, or the art of repairing the old and deteriorated images, has been
around for many years, it has recently gained even more popularity, because of the recent …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 409 Σχετικά άρθρα Όλες οι 11 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scene text detection and recognition: The deep learning era

S Long, X He, C Yao - International Journal of Computer Vision, 2021 - Springer

With the rise and development of deep learning, computer vision has been tremendously
transformed and reshaped. As an important research area in computer vision, scene text …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 533 Σχετικά άρθρα Όλες οι 9 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Seed-bench: Benchmarking multimodal large language models

B Li, Y Ge, Y Ge, G Wang, R Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) building upon the foundation of powerful large
language models (LLMs) have recently demonstrated exceptional capabilities in generating …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 137 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

P Xu, W Shao, K Zhang, P Gao, S Liu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 177 Σχετικά άρθρα Όλες οι 6 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Git: A generative image-to-text transformer for vision and language

J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify
vision-language tasks such as image/video captioning and question answering. While …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 562 Σχετικά άρθρα Όλες οι 4 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Adaptive rotated convolution for rotated object detection

Y Pu, Y Wang, Z **a, Y Han, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Rotated object detection aims to identify and locate objects in images with arbitrary
orientation. In this scenario, the oriented directions of objects vary considerably across …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 103 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer

Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 206 Σχετικά άρθρα Όλες οι 8 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org

Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 468 Σχετικά άρθρα Όλες οι 6 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H **e, Y Wang, Z Mao… - Proceedings of the …, 2021 - openaccess.thecvf.com

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 422 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Conditional text image generation with diffusion models

Y Zhu, Z Li, T Wang, M He… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Current text recognition systems, including those for handwritten scripts and scene text, have
relied heavily on image synthesis and augmentation, since it is difficult to realize real-world …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 63 Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

ICDAR 2013 robust reading competition

Image inpainting: A review

Scene text detection and recognition: The deep learning era

Seed-bench: Benchmarking multimodal large language models

Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

Git: A generative image-to-text transformer for vision and language

Adaptive rotated convolution for rotated object detection

Scene text recognition with permuted autoregressive sequence models

Trocr: Transformer-based optical character recognition with pre-trained models

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

Conditional text image generation with diffusion models