Google Académico

L Guo, J Liu, X Zhu, P Yao, S Lu… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Self-attention (SA) network has shown profound value in image captioning. In this paper, we
improve SA from two aspects to promote the performance of image captioning. First, we …

Guardar Citar Citado por 262 Artículos relacionados Las 9 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org

Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …

Guardar Citar Citado por 161 Artículos relacionados Las 9 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Adaptive path selection for dynamic image captioning

T **an, Z Li, Z Tang, H Ma - … on Circuits and Systems for Video …, 2022 - ieeexplore.ieee.org

Image captioning is a challenging task, ie, given an image machine automatically generates
natural language that matches its semantic content and has attracted much attention in …

Guardar Citar Citado por 53 Artículos relacionados Las 2 versiones

Vision-enhanced and consensus-aware transformer for image captioning

S Cao, G An, Z Zheng, Z Wang - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org

Image captioning generates descriptions in a natural language for a given image. Due to its
great potential for a wide range of applications, many deep learning based-methods have …

Guardar Citar Citado por 45 Artículos relacionados

Joint embedding of deep visual and semantic features for medical image report generation

Y Yang, J Yu, J Zhang, W Han, H Jiang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Medical image report generation (MeIRG) aims at generating associated diagnosis
descriptions with natural language sentences from medical images, which is essential in the …

Guardar Citar Citado por 46 Artículos relacionados Las 2 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prompt-based learning for unpaired image captioning

P Zhu, X Wang, L Zhu, Z Sun, WS Zheng… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Unpaired Image Captioning (UIC) has been developed to learn image descriptions from
unaligned vision-language sample pairs. Existing works usually tackle this task using …

Guardar Citar Citado por 36 Artículos relacionados Las 8 versiones

Visual cluster grounding for image captioning

W Jiang, M Zhu, Y Fang, G Shi… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Attention mechanisms have been extensively adopted in vision and language tasks such as
image captioning. It encourages a captioning model to dynamically ground appropriate …

Guardar Citar Citado por 31 Artículos relacionados Las 5 versiones

Image difference captioning with instance-level fine-grained feature representation

Q Huang, Y Liang, J Wei, Y Cai, H Liang… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

The task of image difference captioning aims at locating changed objects in similar image
pairs and describing the difference with natural language. The key challenges of this task …

Guardar Citar Citado por 46 Artículos relacionados Las 2 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dual attention on pyramid feature maps for image captioning

L Yu, J Zhang, Q Wu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org

Generating natural sentences from images is a fundamental learning task for visual-
semantic understanding in multimedia. In this paper, we propose to apply dual attention on …

Guardar Citar Citado por 53 Artículos relacionados Las 4 versiones

[Free GPT-4]
[DeepSeek]

[PDF] bjtu.edu.cn

Deep reinforcement polishing network for video captioning

W Xu, J Yu, Z Miao, L Wan, Y Tian… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

The video captioning task aims to describe video content using several natural-language
sentences. Although one-step encoder-decoder models have achieved promising progress …

Guardar Citar Citado por 51 Artículos relacionados Las 3 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Show, tell, and polish: Ruminant decoding for image captioning

Normalized and geometry-aware self-attention network for image captioning

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

Adaptive path selection for dynamic image captioning

Vision-enhanced and consensus-aware transformer for image captioning

Joint embedding of deep visual and semantic features for medical image report generation

Prompt-based learning for unpaired image captioning

Visual cluster grounding for image captioning

Image difference captioning with instance-level fine-grained feature representation

Dual attention on pyramid feature maps for image captioning

Deep reinforcement polishing network for video captioning