Google Академія

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arxiv preprint arxiv …, 2023 - arxiv.org

As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

Зберегти Послатися Цитовано в 209 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Зберегти Послатися Цитовано в 393 джерелах Пов’язані статті Кількість версій: 12

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arxiv preprint arxiv:2111.09734, 2021 - arxiv.org

Image captioning is a fundamental task in vision-language understanding, where the model
predicts a textual informative caption to a given input image. In this paper, we present a …

Зберегти Послатися Цитовано в 771 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multiscale vision transformers

H Fan, B **ong, K Mangalam, Y Li… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract We present Multiscale Vision Transformers (MViT) for video and image recognition,
by connecting the seminal idea of multiscale feature hierarchies with transformer models …

Зберегти Послатися Цитовано в 1565 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Imagenet-21k pretraining for the masses

T Ridnik, E Ben-Baruch, A Noy… - arxiv preprint arxiv …, 2021 - arxiv.org

ImageNet-1K serves as the primary dataset for pretraining deep learning models for
computer vision tasks. ImageNet-21K dataset, which is bigger and more diverse, is used …

Зберегти Послатися Цитовано в 732 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Remote sensing image change detection with transformers

H Chen, Z Qi, Z Shi - IEEE Transactions on Geoscience and …, 2021 - ieeexplore.ieee.org

Modern change detection (CD) has achieved remarkable success by the powerful
discriminative ability of deep convolutions. However, high-resolution remote sensing CD …

Зберегти Послатися Цитовано в 1107 джерелах Пов’язані статті Кількість версій: 3

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment

L Yao, J Han, X Liang, D Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper presents DetCLIPv2, an efficient and scalable training framework that
incorporates large-scale image-text pairs to achieve open-vocabulary object detection …

Зберегти Послатися Цитовано в 81 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Image Captioning in news report scenario

T Liu, Q Cai, C Xu, B Hong, J **ong, Y Qiao… - arxiv preprint arxiv …, 2024 - arxiv.org

Image captioning strives to generate pertinent captions for specified images, situating itself
at the crossroads of Computer Vision (CV) and Natural Language Processing (NLP). This …

Зберегти Послатися Цитовано в 55 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reltr: Relation transformer for scene graph generation

Y Cong, MY Yang, B Rosenhahn - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Different objects in the same scene are more or less related to each other, but only a limited
number of these relationships are noteworthy. Inspired by Detection Transformer, which …

Зберегти Послатися Цитовано в 172 джерелах Пов’язані статті Кількість версій: 12

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Prior: Prototype representation joint learning from medical images and reports

P Cheng, L Lin, J Lyu, Y Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Contrastive learning based vision-language joint pre-training has emerged as a successful
representation learning strategy. In this paper, we present a prototype representation …

Зберегти Послатися Цитовано в 54 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Cptr: Full transformer network for image captioning

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

From show to tell: A survey on deep learning-based image captioning

Clipcap: Clip prefix for image captioning

Multiscale vision transformers

Imagenet-21k pretraining for the masses

Remote sensing image change detection with transformers

Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment

Image Captioning in news report scenario

Reltr: Relation transformer for scene graph generation

Prior: Prototype representation joint learning from medical images and reports