Smallcap: lightweight image captioning prompted with retrieval augmentation

R Ramos, B Martins, D Elliott… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent advances in image captioning have focused on scaling the data and model size,
substantially increasing the cost of pre-training and finetuning. As an alternative to large …

The unreasonable effectiveness of CLIP features for image captioning: an experimental analysis

M Barraco, M Cornia, S Cascianelli… - proceedings of the …, 2022 - openaccess.thecvf.com
Generating textual descriptions from visual inputs is a fundamental step towards machine
intelligence, as it entails modeling the connections between the visual and textual …

Positive-augmented contrastive learning for image and video captioning evaluation

S Sarto, M Barraco, M Cornia… - Proceedings of the …, 2023 - openaccess.thecvf.com
The CLIP model has been recently proven to be very effective for a variety of cross-modal
tasks, including the evaluation of captions generated from vision-and-language …

With a little help from your own past: Prototypical memory networks for image captioning

M Barraco, S Sarto, M Cornia… - Proceedings of the …, 2023 - openaccess.thecvf.com
Image captioning, like many tasks involving vision and language, currently relies on
Transformer-based architectures for extracting the semantics in an image and translating it …

Retrieval-augmented transformer for image captioning

S Sarto, M Cornia, L Baraldi, R Cucchiara - Proceedings of the 19th …, 2022 - dl.acm.org
Image captioning models aim at connecting Vision and Language by providing natural
language descriptions of input images. In the past few years, the task has been tackled by …

Retrieval-augmented image captioning

R Ramos, D Elliott, B Martins - arxiv preprint arxiv:2302.08268, 2023 - arxiv.org
Inspired by retrieval-augmented language generation and pretrained Vision and Language
(V&L) encoders, we present a new approach to image captioning that generates sentences …

Cross-domain image captioning with discriminative finetuning

R Dessì, M Bevilacqua, E Gualdoni… - Proceedings of the …, 2023 - openaccess.thecvf.com
Neural captioners are typically trained to mimic human-generated references without
optimizing for any specific communication goal, leading to problems such as the generation …

Fashion-oriented image captioning with external knowledge retrieval and fully attentive gates

N Moratelli, M Barraco, D Morelli, M Cornia, L Baraldi… - Sensors, 2023 - mdpi.com
Research related to fashion and e-commerce domains is gaining attention in computer
vision and multimedia communities. Following this trend, this article tackles the task of …

LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement

Y Yu, F Chen, J Yu, Z Kan - European Conference on Computer Vision, 2024 - Springer
While recent low-light image enhancement (LLIE) methods have made significant
advancements, they still face challenges in terms of low visual quality and weak …

[HTML][HTML] A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

K Zhang, P Li, J Wang - Remote Sensing, 2024 - mdpi.com
Remote sensing images contain a wealth of Earth-observation information. Efficient
extraction and application of hidden knowledge from these images will greatly promote the …