M-FFN: multi-scale feature fusion network for image captioning

J Prudviraj, C Vishnu, CK Mohan - Applied Intelligence, 2022 - Springer
In this work, we present a novel multi-scale feature fusion network (M-FFN) for image
captioning task to incorporate discriminative features and scene contextual information of an …

Attentive contextual network for image captioning

J Prudviraj, C Vishnu, CK Mohan - 2021 International Joint …, 2021 - ieeexplore.ieee.org
Existing image captioning approaches fail to generate fine-grained captions due to the lack
of rich encoding representation of an image. In this paper, we present an attentive contextual …

Explicit disentanglement of appearance and perspective in generative models

N Skafte, S Hauberg - Advances in Neural Information …, 2019 - proceedings.neurips.cc
Disentangled representation learning finds compact, independent and easy-to-interpret
factors of the data. Learning such has been shown to require an inductive bias, which we …

Jointly aligning millions of images with deep penalised reconstruction congealing

R Annunziata, C Sagonas… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Extrapolating fine-grained pixel-level correspondences in a fully unsupervised manner from
a large set of misaligned images can benefit several computer vision and graphics …

[PDF][PDF] Feature map augmentation to improve scale invariance in convolutional neural networks

D Kumar, D Sharma - Journal of Artificial Intelligence and Soft …, 2023 - sciendo.com
Introducing variation in the training dataset through data augmentation has been a popular
technique to make Convolutional Neural Networks (CNNs) spatially invariant but leads to …

Adjoint rigid transform network: Task-conditioned alignment of 3d shapes

K Zhou, BL Bhatnagar, B Schiele… - … conference on 3D …, 2022 - ieeexplore.ieee.org
Most learning methods for 3D data suffer significant performance drops when the data is not
carefully aligned to a canonical orientation. Aligning real world 3D data collected from …

[HTML][HTML] Cot-DCN-YOLO: Self-attention-enhancing YOLOv8s for detecting garbage bins in urban street view images

S Dong, W Xu, H Zhang, L Gong - The Egyptian Journal of Remote Sensing …, 2025 - Elsevier
Accurately and quickly obtaining information from garbage bins has great application value
in smart city construction and urban environmental management. However, existing deep …

[PDF][PDF] Multi-modal information extraction and fusion with convolutional neural networks for classification of scaled images

D Kumar - 2020 - researchprofiles.canberra.edu.au
Develo** computational algorithms to model the biological vision system has challenged
researchers in the computer vision field for several decades. As a result, state-of-the-art …