- Academic Search

A Mohammed, R Kora - Journal of King Saud University-Computer and …, 2023 - Elsevier

In machine learning, two approaches outperform traditional algorithms: ensemble learning
and deep learning. The former refers to methods that integrate multiple base models in the …

Speichern Zitieren Zitiert von: 606 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] arxiv.org

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Speichern Zitieren Zitiert von: 266 Ähnliche Artikel Alle 11 Versionen

[Free GPT-4]

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Speichern Zitieren Zitiert von: 2211 Ähnliche Artikel Alle 11 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press

The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

Speichern Zitieren Zitiert von: 534 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Reproducible scaling laws for contrastive language-image learning

M Cherti, R Beaumont, R Wightman… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scaling up neural networks has led to remarkable performance across a wide range of
tasks. Moreover, performance often follows reliable scaling laws as a function of training set …

Speichern Zitieren Zitiert von: 696 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Speichern Zitieren Zitiert von: 348 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Eva-clip: Improved training techniques for clip at scale

Q Sun, Y Fang, L Wu, X Wang, Y Cao - arxiv preprint arxiv:2303.15389, 2023 - arxiv.org

Contrastive language-image pre-training, CLIP for short, has gained increasing attention for
its potential in various scenarios. In this paper, we propose EVA-CLIP, a series of models …

Speichern Zitieren Zitiert von: 426 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Improving clip training with language rewrites

L Fan, D Krishnan, P Isola… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Contrastive Language-Image Pre-training (CLIP) stands as one of the most effective
and scalable methods for training transferable vision models using paired image and text …

Speichern Zitieren Zitiert von: 153 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …

Speichern Zitieren Zitiert von: 227 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] thecvf.com

Maple: Multi-modal prompt learning

MU Khattak, H Rasheed, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com

Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …

Speichern Zitieren Zitiert von: 681 Ähnliche Artikel Alle 10 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

3d object representations for fine-grained categorization

[HTML][HTML] A comprehensive review on ensemble deep learning: Opportunities and challenges

Multimodal image synthesis and editing: A survey and taxonomy

Dinov2: Learning robust visual features without supervision

Scaling vision transformers to 22 billion parameters

Reproducible scaling laws for contrastive language-image learning

Efficientvit: Memory efficient vision transformer with cascaded group attention

Eva-clip: Improved training techniques for clip at scale

Improving clip training with language rewrites

Eva-02: A visual representation for neon genesis

Maple: Multi-modal prompt learning