Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Negative object presence evaluation (nope) to measure object hallucination in vision-language models
Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …
leading to the generation of nonsensical or unfaithful responses with non-existent objects …
Prefix-diffusion: A lightweight diffusion model for diverse image captioning
G Liu, Y Li, Z Fei, H Fu, X Luo, Y Guo - arxiv preprint arxiv:2309.04965, 2023 - arxiv.org
While impressive performance has been achieved in image captioning, the limited diversity
of the generated captions and the large parameter scale remain major barriers to the real …
of the generated captions and the large parameter scale remain major barriers to the real …
Attractive storyteller: Stylized visual storytelling with unpaired text
Most research on stylized image captioning aims to generate style-specific captions using
unpaired text, and has achieved impressive performance for simple styles like positive and …
unpaired text, and has achieved impressive performance for simple styles like positive and …
A Character-Centric Creative Story Generation via Imagination
Creative story generation has long been a goal of NLP research. While existing
methodologies have aimed to generate long and coherent stories, they fall significantly short …
methodologies have aimed to generate long and coherent stories, they fall significantly short …
Which one are you referring to? multimodal object identification in situated dialogue
The demand for multimodal dialogue systems has been rising in various domains,
emphasizing the importance of interpreting multimodal inputs from conversational and …
emphasizing the importance of interpreting multimodal inputs from conversational and …
VScript: Controllable script generation with visual presentation
In order to offer a customized script tool and inspire professional scriptwriters, we present
VScript. It is a controllable pipeline that generates complete scripts, including dialogues and …
VScript. It is a controllable pipeline that generates complete scripts, including dialogues and …
Style-unaware meta-learning for generalizable person re-identification
J Shao, P Cai - Journal of Electronic Imaging, 2024 - spiedigitallibrary.org
Due to the influence of domain bias, domain generalization person re-identification models
are not capable of generalizing well on unseen domains. The style factor is a critical factor …
are not capable of generalizing well on unseen domains. The style factor is a critical factor …
Visualizing the Unseen: Arabic Image-to-Story Generation Using Deep Learning Techniques
E Saleh, C Sabty - Pacific Rim International Conference on Artificial …, 2024 - Springer
Images are integral to our digital experiences, and combining visual elements with verbal
storytelling is crucial. While English image captioning has progressed significantly, Arabic …
storytelling is crucial. While English image captioning has progressed significantly, Arabic …