Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Bluelm-v-3b: Algorithm and system co-design for multimodal large language models on mobile devices
The emergence and growing popularity of multimodal large language models (MLLMs) have
significant potential to enhance various aspects of daily life, from improving communication …
significant potential to enhance various aspects of daily life, from improving communication …
Skip\n: A simple method to reduce hallucination in large vision-language models
Recent advancements in large vision-language models (LVLMs) have demonstrated
impressive capability in visual information understanding with human language. Despite …
impressive capability in visual information understanding with human language. Despite …
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Training models with longer in-context lengths is a significant challenge for multimodal
machine learning due to substantial GPU memory and computational costs. This exploratory …
machine learning due to substantial GPU memory and computational costs. This exploratory …
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval
Scene text retrieval aims to find all images containing the query text from an image gallery.
Current efforts tend to adopt an Optical Character Recognition (OCR) pipeline, which …
Current efforts tend to adopt an Optical Character Recognition (OCR) pipeline, which …
DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data
In the era of artificial intelligence, the diversity of data modalities and annotation formats
often renders data unusable directly, requiring understanding and format conversion before …
often renders data unusable directly, requiring understanding and format conversion before …
Type-R: Automatically Retouching Typos for Text-to-Image Generation
While recent text-to-image models can generate photorealistic images from text prompts that
reflect detailed instructions, they still face significant challenges in accurately rendering …
reflect detailed instructions, they still face significant challenges in accurately rendering …
Improving text generation on images with synthetic captions
The recent emergence of latent diffusion models such as SDXL [1] and SD 1.5 [2] has shown
significant capability in generating highly detailed and realistic images. Despite their …
significant capability in generating highly detailed and realistic images. Despite their …
Typographic Attacks in a Multi-Image Setting
Large Vision-Language Models (LVLMs) are susceptible to typographic attacks, which are
misclassifications caused by an attack text that is added to an image. In this paper, we …
misclassifications caused by an attack text that is added to an image. In this paper, we …
Extract Free Dense Misalignment from CLIP
Recent vision-language foundation models still frequently produce outputs misaligned with
their inputs, evidenced by object hallucination in captioning and prompt misalignment in the …
their inputs, evidenced by object hallucination in captioning and prompt misalignment in the …
Skip $\textbackslash n $: A simple method to reduce hallucination in Large Vision-Language Models
Recent advancements in large vision-language models (LVLMs) have demonstrated
impressive capability in visual information understanding with human language. Despite …
impressive capability in visual information understanding with human language. Despite …