Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Evaluating text-to-visual generation with image-to-text generation
Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …
challenging because of the lack of effective metrics and standardized benchmarks. For …
Evaluating and improving compositional text-to-visual generation
While text-to-visual models now produce photo-realistic images and videos they struggle
with compositional text prompts involving attributes relationships and higher-order …
with compositional text prompts involving attributes relationships and higher-order …
A survey on evaluation of multimodal large language models
J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …
system by integrating powerful Large Language Models (LLMs) with various modality …
Synthesize diagnose and optimize: Towards fine-grained vision-language understanding
W Peng, S **e, Z You, S Lan… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Vision language models (VLM) have demonstrated remarkable performance across various
downstream tasks. However understanding fine-grained visual-linguistic concepts such as …
downstream tasks. However understanding fine-grained visual-linguistic concepts such as …
Robust noisy correspondence learning with equivariant similarity consistency
Y Yang, L Wang, E Yang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The surge in multi-modal data has propelled cross-modal matching to the forefront of
research interest. However the challenge lies in the laborious and expensive process of …
research interest. However the challenge lies in the laborious and expensive process of …
Auto-encoding morph-tokens for multimodal llm
For multimodal LLMs, the synergy of visual comprehension (textual output) and generation
(visual output) presents an ongoing challenge. This is due to a conflicting objective: for …
(visual output) presents an ongoing challenge. This is due to a conflicting objective: for …
Tripletclip: Improving compositional reasoning of clip via synthetic vision-language negatives
Abstract Contrastive Language-Image Pretraining (CLIP) models maximize the mutual
information between text and visual modalities to learn representations. This makes the …
information between text and visual modalities to learn representations. This makes the …
Revisiting the role of language priors in vision-language models
Vision-language models (VLMs) are impactful in part because they can be applied to a
variety of visual understanding tasks in a zero-shot fashion, without any fine-tuning. We …
variety of visual understanding tasks in a zero-shot fashion, without any fine-tuning. We …
Vismin: Visual minimal-change understanding
Fine-grained understanding of objects, attributes, and relationships between objects is
crucial for visual-language models (VLMs). To evaluate VLMs' fine-grained understanding …
crucial for visual-language models (VLMs). To evaluate VLMs' fine-grained understanding …
Rankclip: Ranking-consistent language-image pretraining
Self-supervised contrastive learning models, such as CLIP, have set new benchmarks for
vision-language models in many downstream tasks. However, their dependency on rigid …
vision-language models in many downstream tasks. However, their dependency on rigid …