- Academic Search

F Zhao, C Zhang, B Geng - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data
(eg, images, texts, or data collected from different sensors), feature engineering (eg …

保存引用被引用数: 34 関連記事

[Free GPT-4]

[PDF] arxiv.org

Graph transformers: A survey

A Shehzad, F **a, S Abid, C Peng, S Yu… - arxiv preprint arxiv …, 2024 - arxiv.org

Graph transformers are a recent advancement in machine learning, offering a new class of
neural network models for graph-structured data. The synergy between transformers and …

保存引用被引用数: 11 関連記事 HTMLバージョン

Sentinel mechanism for visual semantic graph-based image captioning

F **ao, N Zhang, W Xue, X Gao - Computers and Electrical Engineering, 2024 - Elsevier

Image captioning aims to generate a description of a given image. However, inherent
representation differences between images and sentences make it difficult to align semantic …

保存引用被引用数: 1 関連記事全 2 バージョン

[Free GPT-4]

[PDF] openreview.net

Divide and Conquer: Isolating Normal-Abnormal Attributes in Knowledge Graph-Enhanced Radiology Report Generation

X Liang, Y Zhang, D Wang, H Zhong, R Li… - Proceedings of the 32nd …, 2024 - dl.acm.org

Radiology report generation aims to automatically generate clinical descriptions for
radiology images, reducing the workload of radiologists. Compared to general image …

保存引用被引用数: 1 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos

D Verma, D Roy, B Fernando - arxiv preprint arxiv:2407.20642, 2024 - arxiv.org

Situation recognition refers to the ability of an agent to identify and understand various
situations or contexts based on available information and sensory inputs. It involves the …

保存引用被引用数: 1 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] nature.com

RefCap: image captioning with referent objects attributes

S Park, J Paik - Scientific Reports, 2023 - nature.com

In recent years, significant progress has been made in visual-linguistic multi-modality
research, leading to advancements in visual comprehension and its applications in …

保存引用被引用数: 3 関連記事全 10 バージョン

A multi-view projection-based object-aware graph network for dense captioning of point clouds

Z Ma, Z Yang, A Mao, S Wen, R Yi, Y Liu - Computers & Graphics, 2025 - Elsevier

Abstract 3D dense captioning has received increasing attention in the multimodal field of 3D
vision and language. This task aims to generate a specific descriptive sentence for each …

保存引用関連記事

Eye-movement-prompted large image captioning model

Z Yang, B Han, X Gao, ZH Zhan - Pattern Recognition, 2025 - Elsevier

Pretrained large vision-language models have shown outstanding performance on the task
of image captioning. However, owing to the insufficient decoding of image features, existing …

保存引用関連記事全 2 バージョン

[Free GPT-4]

[PDF] ieee.org

EdgeScan for IoT Contextual Understanding With Edge Computing and Image Captioning

DA Hafeth, M Al-khafajiy… - IEEE Internet of Things …, 2024 - ieeexplore.ieee.org

The emergence of Edge Computing has shifted the processing capabilities in proximity to
the Internet of Things data sources, offering solutions to latency and bandwidth constraints …

保存引用関連記事

Mining informativeness in scene graphs: Prioritizing informative relations in Scene Graph Generation for enhanced performance in applications

M Neau, PE Santos, AG Bosser, A Macvicar… - Pattern Recognition …, 2025 - Elsevier

Learning to compose visual relationships from raw images in the form of scene graphs is a
highly challenging Computer Vision task, yet it is essential for applications related to scene …

保存引用関連記事全 2 バージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Transforming visual scene graphs to image captions

Deep Multimodal Data Fusion

Graph transformers: A survey

Sentinel mechanism for visual semantic graph-based image captioning

Divide and Conquer: Isolating Normal-Abnormal Attributes in Knowledge Graph-Enhanced Radiology Report Generation

Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos

RefCap: image captioning with referent objects attributes

A multi-view projection-based object-aware graph network for dense captioning of point clouds

Eye-movement-prompted large image captioning model

EdgeScan for IoT Contextual Understanding With Edge Computing and Image Captioning

Mining informativeness in scene graphs: Prioritizing informative relations in Scene Graph Generation for enhanced performance in applications