Google Tudós

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Mentés Hivatkozás Idézetek száma: 197 Kapcsolódó cikkek Mind a(z) 7 változat Könyvtári keresés HTML-változat

[Free GPT-4]

[PDF] arxiv.org

From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Mentés Hivatkozás Idézetek száma: 395 Kapcsolódó cikkek Mind a(z) 11 változat

[Free GPT-4]

[PDF] arxiv.org

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arxiv preprint arxiv:2111.09734, 2021 - arxiv.org

Image captioning is a fundamental task in vision-language understanding, where the model
predicts a textual informative caption to a given input image. In this paper, we present a …

Mentés Hivatkozás Idézetek száma: 767 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Scaling open-vocabulary image segmentation with image-level labels

G Ghiasi, X Gu, Y Cui, TY Lin - European Conference on Computer Vision, 2022 - Springer

We design an open-vocabulary image segmentation model to organize an image into
meaningful regions indicated by arbitrary texts. Recent works (CLIP and ALIGN), despite …

Mentés Hivatkozás Idézetek száma: 457 Kapcsolódó cikkek Mind a(z) 5 változat

[Free GPT-4]

[PDF] arxiv.org

Large language models: A survey

S Minaee, T Mikolov, N Nikzad, M Chenaghlu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have drawn a lot of attention due to their strong
performance on a wide range of natural language tasks, since the release of ChatGPT in …

Mentés Hivatkozás Idézetek száma: 541 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Making the most of text semantics to improve biomedical vision–language processing

B Boecking, N Usuyama, S Bannur, DC Castro… - European conference on …, 2022 - Springer

Multi-modal data abounds in biomedicine, such as radiology images and reports.
Interpreting this data at scale is essential for improving clinical care and accelerating clinical …

Mentés Hivatkozás Idézetek száma: 252 Kapcsolódó cikkek Mind a(z) 9 változat

[Free GPT-4]

[PDF] thecvf.com

Vinvl: Revisiting visual representations in vision-language models

P Zhang, X Li, X Hu, J Yang, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a detailed study of improving vision features and develops an improved
object detection model for vision language (VL) tasks. Compared to the most widely used …

Mentés Hivatkozás Idézetek száma: 1125 Kapcsolódó cikkek Mind a(z) 8 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

BLEURT: Learning robust metrics for text generation

T Sellam, D Das, AP Parikh - arxiv preprint arxiv:2004.04696, 2020 - arxiv.org

Text generation has made significant advances in the last few years. Yet, evaluation metrics
have lagged behind, as the most popular choices (eg, BLEU and ROUGE) may correlate …

Mentés Hivatkozás Idézetek száma: 1497 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Suppress and balance: A simple gated network for salient object detection

X Zhao, Y Pang, L Zhang, H Lu, L Zhang - Computer Vision–ECCV 2020 …, 2020 - Springer

Most salient object detection approaches use U-Net or feature pyramid networks (FPN) as
their basic structures. These methods ignore two key problems when the encoder …

Mentés Hivatkozás Idézetek száma: 530 Kapcsolódó cikkek Mind a(z) 9 változat

[Free GPT-4]

[PDF] thecvf.com

Attention on attention for image captioning

L Huang, W Wang, J Chen… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Attention mechanisms are widely used in current encoder/decoder frameworks of image
captioning, where a weighted average on encoded vectors is generated at each time step to …

Mentés Hivatkozás Idézetek száma: 1140 Kapcsolódó cikkek Mind a(z) 9 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

From captions to visual concepts and back

Vision-language pre-training: Basics, recent advances, and future trends

From show to tell: A survey on deep learning-based image captioning

Clipcap: Clip prefix for image captioning

Scaling open-vocabulary image segmentation with image-level labels

Large language models: A survey

Making the most of text semantics to improve biomedical vision–language processing

Vinvl: Revisiting visual representations in vision-language models

BLEURT: Learning robust metrics for text generation

Suppress and balance: A simple gated network for salient object detection

Attention on attention for image captioning