Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Evaluating text-to-visual generation with image-to-text generation
Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …
challenging because of the lack of effective metrics and standardized benchmarks. For …
Visual programming for step-by-step text-to-image generation and evaluation
As large language models have demonstrated impressive performance in many domains,
recent works have adopted language models (LMs) as controllers of visual modules for …
recent works have adopted language models (LMs) as controllers of visual modules for …
Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-to-image generation
Evaluating text-to-image models is notoriously difficult. A strong recent approach for
assessing text-image faithfulness is based on QG/A (question generation and answering) …
assessing text-image faithfulness is based on QG/A (question generation and answering) …
Docci: Descriptions of connected and contrasting images
Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T)
research. However, current datasets lack descriptions with fine-grained detail that would …
research. However, current datasets lack descriptions with fine-grained detail that would …
Videoprism: A foundational visual encoder for video understanding
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video
understanding tasks with a single frozen model. We pretrain VideoPrism on a …
understanding tasks with a single frozen model. We pretrain VideoPrism on a …
Evaluating and improving compositional text-to-visual generation
While text-to-visual models now produce photo-realistic images and videos they struggle
with compositional text prompts involving attributes relationships and higher-order …
with compositional text prompts involving attributes relationships and higher-order …
Contrastive region guidance: Improving grounding in vision-language models without training
Highlighting particularly relevant regions of an image can improve the performance of vision-
language models (VLMs) on various vision-language (VL) tasks by guiding the model to …
language models (VLMs) on various vision-language (VL) tasks by guiding the model to …
A survey on advancements in image-text multimodal models: From general techniques to biomedical implementations
With the significant advancements of Large Language Models (LLMs) in the field of Natural
Language Processing (NLP), the development of image-text multimodal models has …
Language Processing (NLP), the development of image-text multimodal models has …
Dreammatcher: appearance matching self-attention for semantically-consistent text-to-image personalization
The objective of text-to-image (T2I) personalization is to customize a diffusion model to a
user-provided reference concept generating diverse images of the concept aligned with the …
user-provided reference concept generating diverse images of the concept aligned with the …
FineMatch: Aspect-Based Fine-Grained Image and Text Mismatch Detection and Correction
Recent progress in large-scale pre-training has led to the development of advanced vision-
language models (VLMs) with remarkable proficiency in comprehending and generating …
language models (VLMs) with remarkable proficiency in comprehending and generating …