Visuals to text: A comprehensive review on automatic image captioning
Y Ming, N Hu, C Fan, F Feng… - IEEE/CAA Journal of …, 2022 - researchportal.port.ac.uk
Image captioning refers to automatic generation of descriptive texts according to the visual
content of images. It is a technique integrating multiple disciplines including the computer …
content of images. It is a technique integrating multiple disciplines including the computer …
Smallcap: lightweight image captioning prompted with retrieval augmentation
Recent advances in image captioning have focused on scaling the data and model size,
substantially increasing the cost of pre-training and finetuning. As an alternative to large …
substantially increasing the cost of pre-training and finetuning. As an alternative to large …
Implicit identity representation conditioned memory compensation network for talking head video generation
Talking head video generation aims to animate a human face in a still image with dynamic
poses and expressions using motion information derived from a target-driving video, while …
poses and expressions using motion information derived from a target-driving video, while …
Retrieval-augmented generation for ai-generated content: A survey
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …
advancements in model algorithms, scalable foundation model architectures, and the …
Deecap: Dynamic early exiting for efficient image captioning
Both accuracy and efficiency are crucial for image captioning in real-world scenarios.
Although Transformer-based models have gained significant improved captioning …
Although Transformer-based models have gained significant improved captioning …
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Large language models (LLMs)-based image captioning has the capability of describing
objects not explicitly observed in training data; yet novel objects occur frequently …
objects not explicitly observed in training data; yet novel objects occur frequently …
Memory-based augmentation network for video captioning
Video captioning focuses on generating natural language descriptions according to the
video content. Existing works mainly explore this multimodal learning with the paired source …
video content. Existing works mainly explore this multimodal learning with the paired source …
Attention-aligned transformer for image captioning
Z Fei - proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Recently, attention-based image captioning models, which are expected to ground correct
image regions for proper word generations, have achieved remarkable performance …
image regions for proper word generations, have achieved remarkable performance …
Retrieval-augmented image captioning
Inspired by retrieval-augmented language generation and pretrained Vision and Language
(V&L) encoders, we present a new approach to image captioning that generates sentences …
(V&L) encoders, we present a new approach to image captioning that generates sentences …
Visual cluster grounding for image captioning
Attention mechanisms have been extensively adopted in vision and language tasks such as
image captioning. It encourages a captioning model to dynamically ground appropriate …
image captioning. It encourages a captioning model to dynamically ground appropriate …