Google Tudós

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Mentés Hivatkozás Idézetek száma: 396 Kapcsolódó cikkek Mind a(z) 11 változat

[Free GPT-4]

[PDF] arxiv.org

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Mentés Hivatkozás Idézetek száma: 266 Kapcsolódó cikkek Mind a(z) 11 változat

[Free GPT-4]

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Mentés Hivatkozás Idézetek száma: 2211 Kapcsolódó cikkek Mind a(z) 11 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Llama-adapter: Efficient fine-tuning of language models with zero-init attention

R Zhang, J Han, C Liu, P Gao, A Zhou, X Hu… - arxiv preprint arxiv …, 2023 - arxiv.org

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …

Mentés Hivatkozás Idézetek száma: 731 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]

[PDF] mlr.press

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press

The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

Mentés Hivatkozás Idézetek száma: 534 Kapcsolódó cikkek Mind a(z) 9 változat HTML-változat

[Free GPT-4]

[PDF] neurips.cc

T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation

K Huang, K Sun, E **e, Z Li… - Advances in Neural …, 2023 - proceedings.neurips.cc

Despite the stunning ability to generate high-quality images by recent text-to-image models,
current approaches often struggle to effectively compose objects with different attributes and …

Mentés Hivatkozás Idézetek száma: 170 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Eva-clip: Improved training techniques for clip at scale

Q Sun, Y Fang, L Wu, X Wang, Y Cao - arxiv preprint arxiv:2303.15389, 2023 - arxiv.org

Contrastive language-image pre-training, CLIP for short, has gained increasing attention for
its potential in various scenarios. In this paper, we propose EVA-CLIP, a series of models …

Mentés Hivatkozás Idézetek száma: 426 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] neurips.cc

Improving clip training with language rewrites

L Fan, D Krishnan, P Isola… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Contrastive Language-Image Pre-training (CLIP) stands as one of the most effective
and scalable methods for training transferable vision models using paired image and text …

Mentés Hivatkozás Idézetek száma: 153 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Mentés Hivatkozás Idézetek száma: 421 Kapcsolódó cikkek Mind a(z) 9 változat

[Free GPT-4]

[PDF] arxiv.org

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …

Mentés Hivatkozás Idézetek száma: 227 Kapcsolódó cikkek Mind a(z) 3 változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Automated flower classification over a large number of classes

From show to tell: A survey on deep learning-based image captioning

Multimodal image synthesis and editing: A survey and taxonomy

Dinov2: Learning robust visual features without supervision

Llama-adapter: Efficient fine-tuning of language models with zero-init attention

Scaling vision transformers to 22 billion parameters

T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation

Eva-clip: Improved training techniques for clip at scale

Improving clip training with language rewrites

Vision-language models for vision tasks: A survey

Eva-02: A visual representation for neon genesis