Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

D Sharma, C Dhiman, D Kumar - Expert Systems with Applications, 2023 - Elsevier
Abstract Automatic Visual Captioning (AVC) generates syntactically and semantically correct
sentences by describing important objects, attributes, and their relationships with each other …

Prototype completion with primitive knowledge for few-shot learning

B Zhang, X Li, Y Ye, Z Huang… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Few-shot learning is a challenging task, which aims to learn a classifier for novel classes
with few examples. Pre-training based meta-learning methods effectively tackle the problem …

[HTML][HTML] New ideas and trends in deep multimodal content understanding: A review

W Chen, W Wang, L Liu, MS Lew - Neurocomputing, 2021 - Elsevier
The focus of this survey is on the analysis of two modalities of multimodal deep learning:
image and text. Unlike classic reviews of deep learning where monomodal image classifiers …

An end-to-end visual-audio attention network for emotion recognition in user-generated videos

S Zhao, Y Ma, Y Gu, J Yang, T **ng, P Xu… - Proceedings of the …, 2020 - ojs.aaai.org
Emotion recognition in user-generated videos plays an important role in human-centered
computing. Existing methods mainly employ traditional two-stage shallow pipeline, ie …

Noise augmented double-stream graph convolutional networks for image captioning

L Wu, M Xu, L Sang, T Yao, T Mei - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Image captioning, aiming at generating natural sentences to describe image contents, has
received significant attention with remarkable improvements in recent advances. The …

Context-aware emotion cause analysis with multi-attention-based neural network

X Li, S Feng, D Wang, Y Zhang - Knowledge-Based Systems, 2019 - Elsevier
Emotion cause analysis has elicited wide interest in both academia and industry, and aims
to identify the reasons behind certain emotions expressed in text. Most of the current studies …

PDANet: Polarity-consistent deep attention network for fine-grained visual emotion regression

S Zhao, Z Jia, H Chen, L Li, G Ding… - Proceedings of the 27th …, 2019 - dl.acm.org
Existing methods on visual emotion analysis mainly focus on coarse-grained emotion
classification, ie assigning an image with a dominant discrete emotion category. However …

Progressive tree-structured prototype network for end-to-end image captioning

P Zeng, J Zhu, J Song, L Gao - … of the 30th ACM international conference …, 2022 - dl.acm.org
Studies of image captioning are shifting towards a trend of a fully end-to-end paradigm by
leveraging powerful visual pre-trained models and transformer-based generation …

Bornon: Bengali image captioning with transformer-based deep learning approach

F Muhammad Shah, M Humaira, MARK Jim… - SN Computer …, 2022 - Springer
Image captioning using encoder–decoder-based approach where CNN is used as the
Encoder and sequence generator like RNN as Decoder has proven to be very effective …

Attribute guided fusion network for obtaining fine-grained image captions

MB Hossen, Z Ye, A Abdussalam, FE Wahab - Multimedia Tools and …, 2024 - Springer
Fine-grained image captioning is gaining traction in multimedia, merging vision-to-language
tasks, with attribute selection now recognized as pivotal in improving performance. While …