Smallcap: lightweight image captioning prompted with retrieval augmentation
Recent advances in image captioning have focused on scaling the data and model size,
substantially increasing the cost of pre-training and finetuning. As an alternative to large …
substantially increasing the cost of pre-training and finetuning. As an alternative to large …
The unreasonable effectiveness of CLIP features for image captioning: an experimental analysis
Generating textual descriptions from visual inputs is a fundamental step towards machine
intelligence, as it entails modeling the connections between the visual and textual …
intelligence, as it entails modeling the connections between the visual and textual …
Positive-augmented contrastive learning for image and video captioning evaluation
The CLIP model has been recently proven to be very effective for a variety of cross-modal
tasks, including the evaluation of captions generated from vision-and-language …
tasks, including the evaluation of captions generated from vision-and-language …
With a little help from your own past: Prototypical memory networks for image captioning
Image captioning, like many tasks involving vision and language, currently relies on
Transformer-based architectures for extracting the semantics in an image and translating it …
Transformer-based architectures for extracting the semantics in an image and translating it …
Retrieval-augmented transformer for image captioning
Image captioning models aim at connecting Vision and Language by providing natural
language descriptions of input images. In the past few years, the task has been tackled by …
language descriptions of input images. In the past few years, the task has been tackled by …
Retrieval-augmented image captioning
Inspired by retrieval-augmented language generation and pretrained Vision and Language
(V&L) encoders, we present a new approach to image captioning that generates sentences …
(V&L) encoders, we present a new approach to image captioning that generates sentences …
Cross-domain image captioning with discriminative finetuning
Neural captioners are typically trained to mimic human-generated references without
optimizing for any specific communication goal, leading to problems such as the generation …
optimizing for any specific communication goal, leading to problems such as the generation …
Fashion-oriented image captioning with external knowledge retrieval and fully attentive gates
Research related to fashion and e-commerce domains is gaining attention in computer
vision and multimedia communities. Following this trend, this article tackles the task of …
vision and multimedia communities. Following this trend, this article tackles the task of …
LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement
While recent low-light image enhancement (LLIE) methods have made significant
advancements, they still face challenges in terms of low visual quality and weak …
advancements, they still face challenges in terms of low visual quality and weak …
[HTML][HTML] A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
K Zhang, P Li, J Wang - Remote Sensing, 2024 - mdpi.com
Remote sensing images contain a wealth of Earth-observation information. Efficient
extraction and application of hidden knowledge from these images will greatly promote the …
extraction and application of hidden knowledge from these images will greatly promote the …