Towards all-in-one pre-training via maximizing multi-modal mutual information
To effectively exploit the potential of large-scale models, various pre-training strategies
supported by massive data from different sources are proposed, including supervised pre …
supported by massive data from different sources are proposed, including supervised pre …
Multitask vision-language prompt tuning
Prompt Tuning, conditioning on task-specific learned prompt vectors, has emerged as a data-
efficient and parameter-efficient method for adapting large pretrained vision-language …
efficient and parameter-efficient method for adapting large pretrained vision-language …
Masked autoencoders are efficient class incremental learners
Abstract Class Incremental Learning (CIL) aims to sequentially learn new classes while
avoiding catastrophic forgetting of previous knowledge. We propose to use Masked …
avoiding catastrophic forgetting of previous knowledge. We propose to use Masked …
Masked autoencoders in computer vision: A comprehensive survey
Z Zhou, X Liu - IEEE Access, 2023 - ieeexplore.ieee.org
Masked autoencoders (MAE) is a deep learning method based on Transformer. Originally
used for images, it has now been extended to video, audio, and some other temporal …
used for images, it has now been extended to video, audio, and some other temporal …
Masked Autoencoders are Secretly Efficient Learners
This paper provides an efficiency study of training Masked Autoencoders (MAE) a framework
introduced by He et. al. for pre-training Vision Transformers (ViTs). Our results surprisingly …
introduced by He et. al. for pre-training Vision Transformers (ViTs). Our results surprisingly …
Masked autoencoding does not help natural language supervision at scale
Self supervision and natural language supervision have emerged as two exciting ways to
train general purpose image encoders which excel at a variety of downstream tasks. Recent …
train general purpose image encoders which excel at a variety of downstream tasks. Recent …
Masked Audio Modeling with CLAP and Multi-Objective Learning
Most existing masked audio modeling (MAM) methods learn audio representations by
masking and reconstructing local spectrogram patches. However, the reconstruction loss …
masking and reconstructing local spectrogram patches. However, the reconstruction loss …
Masked Image Modeling: A Survey
In this work, we survey recent studies on masked image modeling (MIM), an approach that
emerged as a powerful self-supervised learning technique in computer vision. The MIM task …
emerged as a powerful self-supervised learning technique in computer vision. The MIM task …
Self-supervised approach for diabetic retinopathy severity detection using vision transformer
Diabetic retinopathy (DR) is a diabetic condition that affects vision, despite the great success
of supervised learning and Conventional Neural Networks (CNNs), it's still challenging to …
of supervised learning and Conventional Neural Networks (CNNs), it's still challenging to …
Aerial image object detection with vision transformer detector (ViTDet)
L Wang, A Tien - IGARSS 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
The past few years have seen an increased interest in aerial image object detection due to
its critical value to large-scale geoscientific research like environmental studies, urban …
its critical value to large-scale geoscientific research like environmental studies, urban …