Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

D Sharma, C Dhiman, D Kumar - Expert Systems with Applications, 2023 - Elsevier
Abstract Automatic Visual Captioning (AVC) generates syntactically and semantically correct
sentences by describing important objects, attributes, and their relationships with each other …

A survey on advancements in image-text multimodal models: From general techniques to biomedical implementations

R Guo, J Wei, L Sun, B Yu, G Chang, D Liu… - Computers in biology …, 2024 - Elsevier
With the significant advancements of Large Language Models (LLMs) in the field of Natural
Language Processing (NLP), the development of image-text multimodal models has …

Adaptive path selection for dynamic image captioning

T **an, Z Li, Z Tang, H Ma - … on Circuits and Systems for Video …, 2022 - ieeexplore.ieee.org
Image captioning is a challenging task, ie, given an image machine automatically generates
natural language that matches its semantic content and has attracted much attention in …

Split, embed and merge: An accurate table structure recognizer

Z Zhang, J Zhang, J Du, F Wang - Pattern Recognition, 2022 - Elsevier
Table structure recognition is an essential part for making machines understand tables. Its
main task is to recognize the internal structure of a table. However, due to the complexity …

Exploring fine-grained image-text alignment for referring remote sensing image segmentation

S Lei, X **ao, T Zhang, HC Li, Z Shi… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Given a language expression, referring remote sensing image segmentation (RRSIS) aims
to identify ground objects and assign pixelwise labels within the imagery. One of the key …

Transformer-based local-global guidance for image captioning

H Parvin, AR Naghsh-Nilchi, HM Mohammadi - Expert Systems with …, 2023 - Elsevier
Image captioning is a difficult problem for machine learning algorithms to compress huge
amounts of images into descriptive languages. The recurrent models are popularly used as …

A multi-layer memory sharing network for video captioning

TZ Niu, SS Dong, ZD Chen, X Luo, Z Huang, S Guo… - Pattern Recognition, 2023 - Elsevier
Over the past several years, video captioning has received much attention in computer
vision and machine learning communities. Many models utilize an RNN-based decoder to …

Protect, show, attend and tell: Empowering image captioning models with ownership protection

JH Lim, CS Chan, KW Ng, L Fan, Q Yang - Pattern Recognition, 2022 - Elsevier
By and large, existing Intellectual Property (IP) protection on deep neural networks typically
i) focus on image classification task only, and ii) follow a standard digital watermarking …

Image captioning using transformer-based double attention network

H Parvin, AR Naghsh-Nilchi, HM Mohammadi - Engineering Applications of …, 2023 - Elsevier
Image captioning generates a human-like description for a query image, which has attracted
considerable attention recently. The most broadly utilized model for image description is an …

Divergent-convergent attention for image captioning

J Ji, Z Du, X Zhang - Pattern Recognition, 2021 - Elsevier
Attention mechanism has made great progress in image captioning, where semantic words
or local regions are selectively embedded into the language model. However, current …