Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

Exploring deep learning-based architecture, strategies, applications and current trends in generic object detection: A comprehensive review

L Aziz, MSBH Salam, UU Sheikh, S Ayub - Ieee Access, 2020 - ieeexplore.ieee.org
Object detection is a fundamental but challenging issue in the field of generic image
analysis; it plays an important role in a wide range of applications and has been receiving …

Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD conference …, 2022 - dl.acm.org
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

Attribute prototype network for zero-shot learning

W Xu, Y **an, J Wang, B Schiele… - Advances in Neural …, 2020 - proceedings.neurips.cc
From the beginning of zero-shot learning research, visual attributes have been shown to
play an important role. In order to better transfer attribute-based knowledge from known to …

Occlusion aware facial expression recognition using CNN with attention mechanism

Y Li, J Zeng, S Shan, X Chen - IEEE transactions on image …, 2018 - ieeexplore.ieee.org
Facial expression recognition in the wild is challenging due to various unconstrained
conditions. Although existing facial expression classifiers have been almost perfect on …

Transferable attention for domain adaptation

X Wang, L Li, W Ye, M Long, J Wang - Proceedings of the AAAI …, 2019 - ojs.aaai.org
Recent work in domain adaptation bridges different domains by adversarially learning a
domain-invariant representation that cannot be distinguished by a domain discriminator …

High-resolution remote sensing image captioning based on structured attention

R Zhao, Z Shi, Z Zou - IEEE Transactions on Geoscience and …, 2021 - ieeexplore.ieee.org
Automatically generating language descriptions of remote sensing images has become an
emerging research hot spot in the remote sensing field. Attention-based captioning, as a …

Bicro: Noisy correspondence rectification for multi-modality data via bi-directional cross-modal similarity consistency

S Yang, Z Xu, K Wang, Y You, H Yao… - Proceedings of the …, 2023 - openaccess.thecvf.com
As one of the most fundamental techniques in multimodal learning, cross-modal matching
aims to project various sensory modalities into a shared feature space. To achieve this …

Visual news: Benchmark and challenges in news image captioning

F Liu, Y Wang, T Wang, V Ordonez - arxiv preprint arxiv:2010.03743, 2020 - arxiv.org
We propose Visual News Captioner, an entity-aware model for the task of news image
captioning. We also introduce Visual News, a large-scale benchmark consisting of more …

Global visual feature and linguistic state guided attention for remote sensing image captioning

Z Zhang, W Zhang, M Yan, X Gao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The encoder–decoder framework is prevalent in existing remote-sensing image captioning
(RSIC) models. The appearance of attention mechanisms brings significant results …