Visuals to text: A comprehensive review on automatic image captioning

Y Ming, N Hu, C Fan, F Feng… - IEEE/CAA Journal of …, 2022 - researchportal.port.ac.uk
Image captioning refers to automatic generation of descriptive texts according to the visual
content of images. It is a technique integrating multiple disciplines including the computer …

Smallcap: lightweight image captioning prompted with retrieval augmentation

R Ramos, B Martins, D Elliott… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent advances in image captioning have focused on scaling the data and model size,
substantially increasing the cost of pre-training and finetuning. As an alternative to large …

Implicit identity representation conditioned memory compensation network for talking head video generation

FT Hong, D Xu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Talking head video generation aims to animate a human face in a still image with dynamic
poses and expressions using motion information derived from a target-driving video, while …

Retrieval-augmented generation for ai-generated content: A survey

P Zhao, H Zhang, Q Yu, Z Wang, Y Geng, F Fu… - arxiv preprint arxiv …, 2024 - arxiv.org
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …

Deecap: Dynamic early exiting for efficient image captioning

Z Fei, X Yan, S Wang, Q Tian - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Both accuracy and efficiency are crucial for image captioning in real-world scenarios.
Although Transformer-based models have gained significant improved captioning …

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

J Li, DM Vo, A Sugimoto… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Large language models (LLMs)-based image captioning has the capability of describing
objects not explicitly observed in training data; yet novel objects occur frequently …

Memory-based augmentation network for video captioning

S **g, H Zhang, P Zeng, L Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video captioning focuses on generating natural language descriptions according to the
video content. Existing works mainly explore this multimodal learning with the paired source …

Attention-aligned transformer for image captioning

Z Fei - proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Recently, attention-based image captioning models, which are expected to ground correct
image regions for proper word generations, have achieved remarkable performance …

Retrieval-augmented image captioning

R Ramos, D Elliott, B Martins - arxiv preprint arxiv:2302.08268, 2023 - arxiv.org
Inspired by retrieval-augmented language generation and pretrained Vision and Language
(V&L) encoders, we present a new approach to image captioning that generates sentences …

Visual cluster grounding for image captioning

W Jiang, M Zhu, Y Fang, G Shi… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Attention mechanisms have been extensively adopted in vision and language tasks such as
image captioning. It encourages a captioning model to dynamically ground appropriate …