From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

A survey of evaluation metrics used for NLG systems

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org
In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …

Bertscore: Evaluating text generation with bert

T Zhang, V Kishore, F Wu, KQ Weinberger… - arxiv preprint arxiv …, 2019 - arxiv.org
We propose BERTScore, an automatic evaluation metric for text generation. Analogously to
common metrics, BERTScore computes a similarity score for each token in the candidate …

MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance

W Zhao, M Peyrard, F Liu, Y Gao, CM Meyer… - arxiv preprint arxiv …, 2019 - arxiv.org
A robust evaluation metric has a profound impact on the development of text generation
systems. A desirable metric compares system output against references based on their …

See Say and Segment: Teaching LMMs to Overcome False Premises

TH Wu, G Biamby, D Chan, L Dunlap… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-
vocabulary language grounding and segmentation but can suffer under false premises …

Object hallucination in image captioning

A Rohrbach, LA Hendricks, K Burns, T Darrell… - arxiv preprint arxiv …, 2018 - arxiv.org
Despite continuously improving performance, contemporary image captioning models are
prone to" hallucinating" objects that are not actually in a scene. One problem is that standard …

Artemis: Affective language for visual art

P Achlioptas, M Ovsjanikov… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present a novel large-scale dataset and accompanying machine learning models aimed
at providing a detailed understanding of the interplay between visual content, its emotional …

Positive-augmented contrastive learning for image and video captioning evaluation

S Sarto, M Barraco, M Cornia… - Proceedings of the …, 2023 - openaccess.thecvf.com
The CLIP model has been recently proven to be very effective for a variety of cross-modal
tasks, including the evaluation of captions generated from vision-and-language …

Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems

J Dong, S Chen, M Miralinaghi, T Chen, P Li… - … research part C …, 2023 - Elsevier
User trust has been identified as a critical issue that is pivotal to the success of autonomous
vehicle (AV) operations where artificial intelligence (AI) is widely adopted. For such …

Leveraging large language models for nlg evaluation: A survey

Z Li, X Xu, T Shen, C Xu, JC Gu, C Tao - arxiv e-prints, 2024 - ui.adsabs.harvard.edu
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation,
introducing Large Language Models (LLMs) has opened new avenues for assessing …