- Academic Search

Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods

MS Wajid, H Terashima‐Marin, P Najafirad… - Engineering …, 2024 - Wiley Online Library

Generating an image/video caption has always been a fundamental problem of Artificial
Intelligence, which is usually performed using the potential of Deep Learning Methods …

Enregistrer Citer Cité 17 fois Autres articles Les 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance

Y Ma, X Zhang, X Sun, J Ji, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV)
and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior …

Enregistrer Citer Cité 38 fois Autres articles Les 5 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Rotated multi-scale interaction network for referring remote sensing image segmentation

S Liu, Y Ma, X Zhang, H Wang, J Ji… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that
combines computer vision and natural language processing. Traditional Referring Image …

Enregistrer Citer Cité 33 fois Autres articles Les 3 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross-modality perturbation synergy attack for person re-identification

Y Gong, Z Zhong, Y Qu, Z Luo, R Ji, M Jiang - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, there has been significant research focusing on addressing security
concerns in single-modal person re-identification (ReID) systems that are based on RGB …

Enregistrer Citer Cité 18 fois Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

Underwater image captioning: Challenges, models, and datasets

H Li, H Wang, Y Zhang, L Li, P Ren - ISPRS Journal of Photogrammetry and …, 2025 - Elsevier

We delve into the nascent field of underwater image captioning from three perspectives:
challenges, models, and datasets. One challenge arises from the disparities between …

Enregistrer Citer Cité 3 fois Autres articles Les 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

3d-gres: Generalized 3d referring expression segmentation

C Wu, Y Liu, J Ji, Y Ma, H Wang, G Luo… - Proceedings of the …, 2024 - dl.acm.org

3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific
instance within a 3D space based on a natural language description. However, current …

Enregistrer Citer Cité 4 fois Autres articles Les 5 versions Free GPT-4 DeepSeek

Vision-language pre-training via modal interaction

H Cheng, H Ye, X Zhou, X Liu, F Chen, M Wang - Pattern Recognition, 2024 - Elsevier

Existing vision-language pre-training models typically extract region features and conduct
fine-grained local alignment based on masked image/text completion or object detection …

Enregistrer Citer Cité 3 fois Autres articles Les 5 versions Free GPT-4 DeepSeek

M3ixup: A multi-modal data augmentation approach for image captioning

Y Li, J Ji, X Sun, Y Zhou, Y Luo, R Ji - Pattern Recognition, 2025 - Elsevier

Despite the great success, most models in image captioning (IC) are still stuck in the
dilemma of generating simple and non-discriminative captions. In this paper, we study this …

Enregistrer Citer Cité 2 fois Autres articles Les 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An ensemble model with attention based mechanism for image captioning

I Al Badarneh, BH Hammo, O Al-Kadi - Computers and Electrical …, 2025 - Elsevier

Image captioning creates informative text from an input image by creating a relationship
between the words and the actual content of an image. Recently, deep learning models that …

Enregistrer Citer Cité 1 fois Autres articles Les 3 versions Free GPT-4 DeepSeek

ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor

MB Hossen, Z Ye, A Abdussalam, MA Hossain - Displays, 2024 - Elsevier

Fine-grained image captioning is a focal point in the vision-to-language task and has
attracted considerable attention for generating accurate and contextually relevant image …

Enregistrer Citer Cité 3 fois Autres articles Les 2 versions Free GPT-4 DeepSeek

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Towards local visual modeling for image captioning

Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods

X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance

Rotated multi-scale interaction network for referring remote sensing image segmentation

Cross-modality perturbation synergy attack for person re-identification

Underwater image captioning: Challenges, models, and datasets

3d-gres: Generalized 3d referring expression segmentation

Vision-language pre-training via modal interaction

M3ixup: A multi-modal data augmentation approach for image captioning

An ensemble model with attention based mechanism for image captioning

ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor