Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Deepseek-vl: towards real-world vision-language understanding
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …
world vision and language understanding applications. Our approach is structured around …
Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …
model linguistic rules in end-to-end deep networks remains a research challenge. In this …
Scene text recognition with permuted autoregressive sequence models
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
Revisiting scene text recognition: A data perspective
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective.
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …
Dtrocr: Decoder-only transformer for optical character recognition
M Fujitake - Proceedings of the IEEE/CVF winter conference …, 2024 - openaccess.thecvf.com
Typical text recognition methods rely on an encoder-decoder structure, in which the encoder
extracts features from an image, and the decoder produces recognized text from these …
extracts features from an image, and the decoder produces recognized text from these …
Mllmguard: A multi-dimensional safety evaluation suite for multimodal large language models
Powered by remarkable advancements in Large Language Models (LLMs), Multimodal
Large Language Models (MLLMs) demonstrate impressive capabilities in manifold tasks …
Large Language Models (MLLMs) demonstrate impressive capabilities in manifold tasks …
Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting
Scene text spotting is of great importance to the computer vision community due to its wide
variety of applications. Recent methods attempt to introduce linguistic knowledge for …
variety of applications. Recent methods attempt to introduce linguistic knowledge for …
What if we only use real datasets for scene text recognition? toward scene text recognition with fewer labels
Scene text recognition (STR) task has a common practice: All state-of-the-art STR models
are trained on large synthetic data. In contrast to this practice, training STR models only on …
are trained on large synthetic data. In contrast to this practice, training STR models only on …
LISTER: neighbor decoding for length-insensitive scene text recognition
The diversity in length constitutes a significant characteristic of text. Due to the long-tail
distribution of text lengths, most existing methods for scene text recognition (STR) only work …
distribution of text lengths, most existing methods for scene text recognition (STR) only work …
CLIP4STR: a simple baseline for scene text recognition with pre-trained vision-language model
Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …
downstream tasks. However, scene text recognition methods still prefer backbones pre …