Graph transformers: A survey
Graph transformers are a recent advancement in machine learning, offering a new class of
neural network models for graph-structured data. The synergy between transformers and …
neural network models for graph-structured data. The synergy between transformers and …
Sentinel mechanism for visual semantic graph-based image captioning
F **ao, N Zhang, W Xue, X Gao - Computers and Electrical Engineering, 2024 - Elsevier
Image captioning aims to generate a description of a given image. However, inherent
representation differences between images and sentences make it difficult to align semantic …
representation differences between images and sentences make it difficult to align semantic …
Divide and Conquer: Isolating Normal-Abnormal Attributes in Knowledge Graph-Enhanced Radiology Report Generation
Radiology report generation aims to automatically generate clinical descriptions for
radiology images, reducing the workload of radiologists. Compared to general image …
radiology images, reducing the workload of radiologists. Compared to general image …
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Situation recognition refers to the ability of an agent to identify and understand various
situations or contexts based on available information and sensory inputs. It involves the …
situations or contexts based on available information and sensory inputs. It involves the …
RefCap: image captioning with referent objects attributes
S Park, J Paik - Scientific Reports, 2023 - nature.com
In recent years, significant progress has been made in visual-linguistic multi-modality
research, leading to advancements in visual comprehension and its applications in …
research, leading to advancements in visual comprehension and its applications in …
A multi-view projection-based object-aware graph network for dense captioning of point clouds
Z Ma, Z Yang, A Mao, S Wen, R Yi, Y Liu - Computers & Graphics, 2025 - Elsevier
Abstract 3D dense captioning has received increasing attention in the multimodal field of 3D
vision and language. This task aims to generate a specific descriptive sentence for each …
vision and language. This task aims to generate a specific descriptive sentence for each …
Eye-movement-prompted large image captioning model
Z Yang, B Han, X Gao, ZH Zhan - Pattern Recognition, 2025 - Elsevier
Pretrained large vision-language models have shown outstanding performance on the task
of image captioning. However, owing to the insufficient decoding of image features, existing …
of image captioning. However, owing to the insufficient decoding of image features, existing …
EdgeScan for IoT Contextual Understanding With Edge Computing and Image Captioning
DA Hafeth, M Al-khafajiy… - IEEE Internet of Things …, 2024 - ieeexplore.ieee.org
The emergence of Edge Computing has shifted the processing capabilities in proximity to
the Internet of Things data sources, offering solutions to latency and bandwidth constraints …
the Internet of Things data sources, offering solutions to latency and bandwidth constraints …
Mining informativeness in scene graphs: Prioritizing informative relations in Scene Graph Generation for enhanced performance in applications
Learning to compose visual relationships from raw images in the form of scene graphs is a
highly challenging Computer Vision task, yet it is essential for applications related to scene …
highly challenging Computer Vision task, yet it is essential for applications related to scene …