Mining core information by evaluating semantic importance for unpaired image captioning
Recently, exciting progress has been made in the research of supervised image captioning.
However, manually annotated image-annotation pair data is difficult and expensive to …
However, manually annotated image-annotation pair data is difficult and expensive to …
Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning
Z Liu, J Liu, F Ma - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Although image captioning models have made significant advancements in recent years, the
majority of them heavily depend on high-quality datasets containing paired images and texts …
majority of them heavily depend on high-quality datasets containing paired images and texts …
Cross-Modal Coherence-Enhanced Feedback Prompting for News Captioning
News Captioning involves generating the descriptions for news images based on the
detailed content of related news articles. Given that these articles often contain extensive …
detailed content of related news articles. Given that these articles often contain extensive …
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Recently, zero-shot image captioning has gained increasing attention, where only text data
is available for training. The remarkable progress in text-to-image diffusion model presents …
is available for training. The remarkable progress in text-to-image diffusion model presents …
Enhancing Image Captioning Using Deep Convolutional Generative Adversarial Networks
Introduction: Image caption generation has long been a fundamental challenge in the area
of computer vision (CV) and natural language processing (NLP). In this research, we present …
of computer vision (CV) and natural language processing (NLP). In this research, we present …
CVLP-NaVD: Contrastive Visual-Language Pre-training Models for Non-annotated Visual Description
Non-annotated visual description (NaVD) aims to describe generic visuals without human-
annotated pairwise data. The generic visuals refer to images and videos. Existing works …
annotated pairwise data. The generic visuals refer to images and videos. Existing works …
Pseudo Content Hallucination for Unpaired Image Captioning
Unpaired Image Captioning (UIC) is designed to describe an image without relying on
matched vision-language training data. It is a challenging task since (1) the implicit and …
matched vision-language training data. It is a challenging task since (1) the implicit and …
Dynamic text prompt joint multimodal features for accurate plant disease image captioning
Plant disease captioning is crucial for agricultural pest and disease prevention. However,
generating accurate captions for plant disease images remains challenging because of the …
generating accurate captions for plant disease images remains challenging because of the …
Exploring annotation-free image captioning with retrieval-augmented pseudo sentence generation
Recently, training an image captioner without annotated image-sentence pairs has gained
traction. Previous methods have faced limitations due to either using mismatched corpora for …
traction. Previous methods have faced limitations due to either using mismatched corpora for …
Can Language Improve Visual Features For Distinguishing Unseen Plant Diseases?
Deep learning approaches have been pivotal in identifying multi-plant diseases, yet they
often struggle with unseen data. The challenge of handling unseen data is significant due to …
often struggle with unseen data. The challenge of handling unseen data is significant due to …