Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Deep image captioning: A review of methods, trends and future challenges
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …
content of images in human language, which requires to model semantic relationship …
Automatic image and video caption generation with deep learning: A concise review and algorithmic overlap
Methodologies that utilize Deep Learning offer great potential for applications that
automatically attempt to generate captions or descriptions about images and video frames …
automatically attempt to generate captions or descriptions about images and video frames …
How much can clip benefit vision-and-language tasks?
Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using
a relatively small set of manually-annotated data (as compared to web-crawled data), to …
a relatively small set of manually-annotated data (as compared to web-crawled data), to …
Auto-encoding scene graphs for image captioning
Abstract We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language
inductive bias into the encoder-decoder image captioning framework for more human-like …
inductive bias into the encoder-decoder image captioning framework for more human-like …
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models
Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision
and language tasks. However, their ability to reason about spatial arrangements remains …
and language tasks. However, their ability to reason about spatial arrangements remains …
Causal attention for vision-language tasks
We present a novel attention mechanism: Causal Attention (CATT), to remove the ever-
elusive confounding effect in existing attention-based vision-language models. This effect …
elusive confounding effect in existing attention-based vision-language models. This effect …
Object hallucination in image captioning
Despite continuously improving performance, contemporary image captioning models are
prone to" hallucinating" objects that are not actually in a scene. One problem is that standard …
prone to" hallucinating" objects that are not actually in a scene. One problem is that standard …
On hallucination and predictive uncertainty in conditional language generation
Despite improvements in performances on different natural language generation tasks, deep
neural models are prone to hallucinating facts that are incorrect or nonexistent. Different …
neural models are prone to hallucinating facts that are incorrect or nonexistent. Different …
Fusecap: Leveraging large language models for enriched fused image captions
The advent of vision-language pre-training techniques enhanced substantial progress in the
development of models for image captioning. However, these models frequently produce …
development of models for image captioning. However, these models frequently produce …
Understanding and evaluating racial biases in image captioning
Image captioning is an important task for benchmarking visual reasoning and for enabling
accessibility for people with vision impairments. However, as in many machine learning …
accessibility for people with vision impairments. However, as in many machine learning …