From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Contrastive attention for automatic chest x-ray report generation

F Liu, C Yin, X Wu, S Ge, Y Zou, P Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
Recently, chest X-ray report generation, which aims to automatically generate descriptions
of given chest X-ray images, has received growing research interests. The key challenge of …

Syntax-aware action targeting for video captioning

Q Zheng, C Wang, D Tao - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
Existing methods on video captioning have made great efforts to identify objects/instances in
videos, but few of them emphasize the prediction of action. As a result, the learned models …

Rumor detection on social media with graph adversarial contrastive learning

T Sun, Z Qian, S Dong, P Li, Q Zhu - … of the ACM Web Conference 2022, 2022 - dl.acm.org
Rumors spread through the Internet, especially on Twitter, have harmed social stability and
residents' daily lives. Recently, in addition to utilizing the text features of posts for rumor …

Fine-grained image captioning with clip reward

J Cho, S Yoon, A Kale, F Dernoncourt, T Bui… - arxiv preprint arxiv …, 2022 - arxiv.org
Modern image captioning models are usually trained with text similarity objectives. However,
since reference captions in public datasets often describe the most salient common objects …

Neural sign language translation based on human keypoint estimation

SK Ko, CJ Kim, H Jung, C Cho - Applied sciences, 2019 - mdpi.com
We propose a sign language translation system based on human keypoint estimation. It is
well-known that many problems in the field of computer vision require a massive dataset to …

Contrabert: Enhancing code pre-trained models via contrastive learning

S Liu, B Wu, X **e, G Meng, Y Liu - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Large-scale pre-trained models such as CodeBERT, GraphCodeBERT have earned
widespread attention from both academia and industry. Attributed to the superior ability in …

Discriminability objective for training descriptive captions

R Luo, B Price, S Cohen… - Proceedings of the …, 2018 - openaccess.thecvf.com
One property that remains lacking in image captions generated by contemporary methods is
discriminability: being able to tell two images apart given the caption for one of them. We …

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …