From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
Deep learning approaches on image captioning: A review
Image captioning is a research area of immense importance, aiming to generate natural
language descriptions for visual content in the form of still images. The advent of deep …
language descriptions for visual content in the form of still images. The advent of deep …
Skeleton-based action recognition via spatial and temporal transformer networks
Abstract Skeleton-based Human Activity Recognition has achieved great interest in recent
years as skeleton data has demonstrated being robust to illumination changes, body scales …
years as skeleton data has demonstrated being robust to illumination changes, body scales …
Spatial-temporal transformer for dynamic scene graph generation
Dynamic scene graph generation aims at generating a scene graph of the given video.
Compared to the task of scene graph generation from images, it is more challenging …
Compared to the task of scene graph generation from images, it is more challenging …
Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems
User trust has been identified as a critical issue that is pivotal to the success of autonomous
vehicle (AV) operations where artificial intelligence (AI) is widely adopted. For such …
vehicle (AV) operations where artificial intelligence (AI) is widely adopted. For such …
β-Variational autoencoders and transformers for reduced-order modelling of fluid flows
Variational autoencoder architectures have the potential to develop reduced-order models
for chaotic fluid flows. We propose a method for learning compact and near-orthogonal …
for chaotic fluid flows. We propose a method for learning compact and near-orthogonal …
Automated radiographic report generation purely on transformer: A multicriteria supervised approach
Automated radiographic report generation is challenging in at least two aspects. First,
medical images are very similar to each other and the visual differences of clinic importance …
medical images are very similar to each other and the visual differences of clinic importance …
Dynamic scene graph generation via anticipatory pre-training
Humans can not only see the collection of objects in visual scenes, but also identify the
relationship between objects. The visual relationship in the scene can be abstracted into the …
relationship between objects. The visual relationship in the scene can be abstracted into the …
Dual graph convolutional networks with transformer and curriculum learning for image captioning
Existing image captioning methods just focus on understanding the relationship between
objects or instances in a single image, without exploring the contextual correlation existed …
objects or instances in a single image, without exploring the contextual correlation existed …
Reformer: The relational transformer for image captioning
Image captioning is shown to be able to achieve a better performance by using scene
graphs to represent the relations of objects in the image. The current captioning encoders …
graphs to represent the relations of objects in the image. The current captioning encoders …