Mining core information by evaluating semantic importance for unpaired image captioning

J Wei, Z Li, C Zhang, H Ma - Neural Networks, 2024 - Elsevier
Recently, exciting progress has been made in the research of supervised image captioning.
However, manually annotated image-annotation pair data is difficult and expensive to …

Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning

Z Liu, J Liu, F Ma - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Although image captioning models have made significant advancements in recent years, the
majority of them heavily depend on high-quality datasets containing paired images and texts …

Cross-Modal Coherence-Enhanced Feedback Prompting for News Captioning

N Xu, Y Gao, TT Zhang, H Tian, AA Liu - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
News Captioning involves generating the descriptions for news images based on the
detailed content of related news articles. Given that these articles often contain extensive …

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning

J Luo, J Chen, Y Li, Y Pan, J Feng, H Chao… - European Conference on …, 2024 - Springer
Recently, zero-shot image captioning has gained increasing attention, where only text data
is available for training. The remarkable progress in text-to-image diffusion model presents …

Enhancing Image Captioning Using Deep Convolutional Generative Adversarial Networks

T Jaiswal, M Pandey, P Tripathi - Recent Advances in …, 2024 - ingentaconnect.com
Introduction: Image caption generation has long been a fundamental challenge in the area
of computer vision (CV) and natural language processing (NLP). In this research, we present …

CVLP-NaVD: Contrastive Visual-Language Pre-training Models for Non-annotated Visual Description

H Li, Y Hao, J Yu, B Zhu, S Wang, T Xu - ACM Transactions on …, 2024 - dl.acm.org
Non-annotated visual description (NaVD) aims to describe generic visuals without human-
annotated pairwise data. The generic visuals refer to images and videos. Existing works …

Pseudo Content Hallucination for Unpaired Image Captioning

H Ben, S Wang, M Wang, R Hong - Proceedings of the 2024 …, 2024 - dl.acm.org
Unpaired Image Captioning (UIC) is designed to describe an image without relying on
matched vision-language training data. It is a challenging task since (1) the implicit and …

Dynamic text prompt joint multimodal features for accurate plant disease image captioning

F Liang, Z Huang, W Wang, Z He, Q En - The Visual Computer, 2024 - Springer
Plant disease captioning is crucial for agricultural pest and disease prevention. However,
generating accurate captions for plant disease images remains challenging because of the …

Exploring annotation-free image captioning with retrieval-augmented pseudo sentence generation

Z Li, D Liu, H Wang, C Zhang, W Cai - Proceedings of the 6th ACM …, 2024 - dl.acm.org
Recently, training an image captioner without annotated image-sentence pairs has gained
traction. Previous methods have faced limitations due to either using mismatched corpora for …

Can Language Improve Visual Features For Distinguishing Unseen Plant Diseases?

JZ Liaw, AYH Chai, SH Lee, P Bonnet… - … Conference on Pattern …, 2025 - Springer
Deep learning approaches have been pivotal in identifying multi-plant diseases, yet they
often struggle with unseen data. The challenge of handling unseen data is significant due to …