From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Deep learning approaches on image captioning: A review

T Ghandi, H Pourreza, H Mahyar - ACM Computing Surveys, 2023 - dl.acm.org
Image captioning is a research area of immense importance, aiming to generate natural
language descriptions for visual content in the form of still images. The advent of deep …

Skeleton-based action recognition via spatial and temporal transformer networks

C Plizzari, M Cannici, M Matteucci - Computer Vision and Image …, 2021 - Elsevier
Abstract Skeleton-based Human Activity Recognition has achieved great interest in recent
years as skeleton data has demonstrated being robust to illumination changes, body scales …

Spatial-temporal transformer for dynamic scene graph generation

Y Cong, W Liao, H Ackermann… - Proceedings of the …, 2021 - openaccess.thecvf.com
Dynamic scene graph generation aims at generating a scene graph of the given video.
Compared to the task of scene graph generation from images, it is more challenging …

Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems

J Dong, S Chen, M Miralinaghi, T Chen, P Li… - … research part C …, 2023 - Elsevier
User trust has been identified as a critical issue that is pivotal to the success of autonomous
vehicle (AV) operations where artificial intelligence (AI) is widely adopted. For such …

β-Variational autoencoders and transformers for reduced-order modelling of fluid flows

A Solera-Rico, C Sanmiguel Vila… - Nature …, 2024 - nature.com
Variational autoencoder architectures have the potential to develop reduced-order models
for chaotic fluid flows. We propose a method for learning compact and near-orthogonal …

Automated radiographic report generation purely on transformer: A multicriteria supervised approach

Z Wang, H Han, L Wang, X Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Automated radiographic report generation is challenging in at least two aspects. First,
medical images are very similar to each other and the visual differences of clinic importance …

Dynamic scene graph generation via anticipatory pre-training

Y Li, X Yang, C Xu - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Humans can not only see the collection of objects in visual scenes, but also identify the
relationship between objects. The visual relationship in the scene can be abstracted into the …

Dual graph convolutional networks with transformer and curriculum learning for image captioning

X Dong, C Long, W Xu, C **ao - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Existing image captioning methods just focus on understanding the relationship between
objects or instances in a single image, without exploring the contextual correlation existed …

Reformer: The relational transformer for image captioning

X Yang, Y Liu, X Wang - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
Image captioning is shown to be able to achieve a better performance by using scene
graphs to represent the relations of objects in the image. The current captioning encoders …