Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arxiv preprint arxiv …, 2023 - arxiv.org
Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

Prefix-diffusion: A lightweight diffusion model for diverse image captioning

G Liu, Y Li, Z Fei, H Fu, X Luo, Y Guo - arxiv preprint arxiv:2309.04965, 2023 - arxiv.org
While impressive performance has been achieved in image captioning, the limited diversity
of the generated captions and the large parameter scale remain major barriers to the real …

Attractive storyteller: Stylized visual storytelling with unpaired text

D Yang, Q ** - Proceedings of the 61st Annual Meeting of the …, 2023 - aclanthology.org
Most research on stylized image captioning aims to generate style-specific captions using
unpaired text, and has achieved impressive performance for simple styles like positive and …

A Character-Centric Creative Story Generation via Imagination

K Park, M Kim, K Jung - arxiv preprint arxiv:2409.16667, 2024 - arxiv.org
Creative story generation has long been a goal of NLP research. While existing
methodologies have aimed to generate long and coherent stories, they fall significantly short …

Which one are you referring to? multimodal object identification in situated dialogue

H Lovenia, S Cahyawijaya, P Fung - arxiv preprint arxiv:2302.14680, 2023 - arxiv.org
The demand for multimodal dialogue systems has been rising in various domains,
emphasizing the importance of interpreting multimodal inputs from conversational and …

VScript: Controllable script generation with visual presentation

Z Ji, Y Xu, I Cheng, S Cahyawijaya, R Frieske… - arxiv preprint arxiv …, 2022 - arxiv.org
In order to offer a customized script tool and inspire professional scriptwriters, we present
VScript. It is a controllable pipeline that generates complete scripts, including dialogues and …

Style-unaware meta-learning for generalizable person re-identification

J Shao, P Cai - Journal of Electronic Imaging, 2024 - spiedigitallibrary.org
Due to the influence of domain bias, domain generalization person re-identification models
are not capable of generalizing well on unseen domains. The style factor is a critical factor …

Visualizing the Unseen: Arabic Image-to-Story Generation Using Deep Learning Techniques

E Saleh, C Sabty - Pacific Rim International Conference on Artificial …, 2024 - Springer
Images are integral to our digital experiences, and combining visual elements with verbal
storytelling is crucial. While English image captioning has progressed significantly, Arabic …