- Academic Search

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Enregistrer Citer Cité 605 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] springer.com

Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer

Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

Enregistrer Citer Cité 1891 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

Enregistrer Citer Cité 1007 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Maxvit: Multi-axis vision transformer

Z Tu, H Talebi, H Zhang, F Yang, P Milanfar… - European conference on …, 2022 - Springer

Transformers have recently gained significant attention in the computer vision community.
However, the lack of scalability of self-attention mechanisms with respect to image size has …

Enregistrer Citer Cité 765 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Z Tong, Y Song, J Wang… - Advances in neural …, 2022 - proceedings.neurips.cc

Pre-training video transformers on extra large-scale datasets is generally required to
achieve premier performance on relatively small datasets. In this paper, we show that video …

Enregistrer Citer Cité 1137 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Efficientformer: Vision transformers at mobilenet speed

Y Li, G Yuan, Y Wen, J Hu… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Vision Transformers (ViT) have shown rapid progress in computer vision tasks,
achieving promising results on various benchmarks. However, due to the massive number of …

Enregistrer Citer Cité 382 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

PixArt-: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

J Chen, J Yu, C Ge, L Yao, E **e, Y Wu, Z Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

The most advanced text-to-image (T2I) models require significant training costs (eg, millions
of GPU hours), seriously hindering the fundamental innovation for the AIGC community …

Enregistrer Citer Cité 378 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Scaling & shifting your features: A new baseline for efficient model tuning

D Lian, D Zhou, J Feng, X Wang - Advances in Neural …, 2022 - proceedings.neurips.cc

Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-
tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers …

Enregistrer Citer Cité 241 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Deep model reassembly

X Yang, D Zhou, S Liu, J Ye… - Advances in neural …, 2022 - proceedings.neurips.cc

In this paper, we explore a novel knowledge-transfer task, termed as Deep Model
Reassembly (DeRy), for general-purpose model reuse. Given a collection of heterogeneous …

Enregistrer Citer Cité 149 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Vision transformer with deformable attention

Z **a, X Pan, S Song, LE Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Transformers have recently shown superior performances on various vision tasks. The large,
sometimes even global, receptive field endows Transformer models with higher …

Enregistrer Citer Cité 653 fois Autres articles Les 6 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Deepvit: Towards deeper vision transformer

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Attention mechanisms in computer vision: A survey

Vision mamba: Efficient visual representation learning with bidirectional state space model

Maxvit: Multi-axis vision transformer

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Efficientformer: Vision transformers at mobilenet speed

PixArt-: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Scaling & shifting your features: A new baseline for efficient model tuning

Deep model reassembly

Vision transformer with deformable attention