Google 학술 검색

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier

Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

저장 인용 39회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]

[PDF] arxiv.org

Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

Q Zeng, Z Wang, Y Cheung, M Jiang - arxiv preprint arxiv:2408.08989, 2024 - arxiv.org

While image-to-text models have demonstrated significant advancements in various vision-
language tasks, they remain susceptible to adversarial attacks. Existing white-box attacks on …

저장 인용 2회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning

J Li, Z Mao, H Li, W Chen, Y Zhang - ACM Transactions on Multimedia …, 2024 - dl.acm.org

Image captioning (IC), bringing vision to language, has drawn extensive attention. A crucial
aspect of IC is the accurate depiction of visual relations among image objects. Visual …

저장 인용 6회 인용 관련 학술자료

NumCap: a number-controlled multi-caption image captioning network

A Abdussalam, Z Ye, A Hawbani, M Al-Qatf… - ACM Transactions on …, 2023 - dl.acm.org

Image captioning is a promising task that attracted researchers in the last few years. Existing
image captioning models are primarily trained to generate one caption per image. However …

저장 인용 11회 인용 관련 학술자료

Multi-scale motivated neural network for image-text matching

X Qin, L Li, G Pang - Multimedia Tools and Applications, 2024 - Springer

Existing mainstream image-text matching methods usually measure the relevance of image-
text pairs by capturing and aggregating the affinities between textual words and visual …

저장 인용 5회 인용 관련 학술자료 전체 3개의 버전

Video captioning by learning from global sentence and looking ahead

TZ Niu, ZD Chen, X Luo, PF Zhang, Z Huang… - ACM Transactions on …, 2023 - dl.acm.org

Video captioning aims to automatically generate natural language sentences describing the
content of a video. Although encoder-decoder-based models have achieved promising …

저장 인용 4회 인용 관련 학술자료 전체 2개의 버전

Semantic enhanced video captioning with multi-feature fusion

TZ Niu, SS Dong, ZD Chen, X Luo, S Guo… - ACM Transactions on …, 2023 - dl.acm.org

Video captioning aims to automatically describe a video clip with informative sentences. At
present, deep learning-based models have become the mainstream for this task and …

저장 인용 4회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]

[PDF] acm.org

A²SC: Adversarial Attacks on Subspace Clustering

Y Xu, X Wei, P Dai, X Cao - ACM Transactions on Multimedia Computing …, 2023 - dl.acm.org

Many studies demonstrate that supervised learning techniques are vulnerable to adversarial
examples. However, adversarial threats in unsupervised learning have not drawn sufficient …

저장 인용 6회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] arxiv.org

Zero-shot scene graph generation via triplet calibration and reduction

J Li, Y Wang, W Li - ACM Transactions on Multimedia Computing …, 2023 - dl.acm.org

Scene Graph Generation (SGG) plays a pivotal role in downstream vision-language tasks.
Existing SGG methods typically suffer from poor compositional generalizations on unseen …

저장 인용 2회 인용 관련 학술자료 전체 3개의 버전

Cross-modality interaction reasoning for enhancing vision-language pre-training in image-text retrieval

T Yao, S Peng, L Wang, Y Li, Y Sun - Applied Intelligence, 2024 - Springer

Recent days have seen significant improvements in multi-modal learning made by Vision-
Language Pre-training (VLP) models. However, most of them employ the coarse-grained …

저장 인용 관련 학술자료 전체 3개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Learning transferable perturbations for image captioning

Deep image captioning: A review of methods, trends and future challenges

Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning

NumCap: a number-controlled multi-caption image captioning network

Multi-scale motivated neural network for image-text matching

Video captioning by learning from global sentence and looking ahead

Semantic enhanced video captioning with multi-feature fusion

A²SC: Adversarial Attacks on Subspace Clustering

Zero-shot scene graph generation via triplet calibration and reduction

Cross-modality interaction reasoning for enhancing vision-language pre-training in image-text retrieval