A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …
massive model sizes that require significant computational and storage resources. To …
A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …
achieve satisfactory performance. However, the process of collecting and labeling such data …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Imagebind: One embedding space to bind them all
We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …
Sigmoid loss for language image pre-training
We propose a simple pairwise sigmoid loss for image-text pre-training. Unlike standard
contrastive learning with softmax normalization, the sigmoid loss operates solely on image …
contrastive learning with softmax normalization, the sigmoid loss operates solely on image …
Scaling language-image pre-training via masking
Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …
method for training CLIP. Our method randomly masks out and removes a large portion of …
Rethinking semantic segmentation: A prototype view
Prevalent semantic segmentation solutions, despite their different network designs (FCN
based or attention based) and mask decoding strategies (parametric softmax based or pixel …
based or attention based) and mask decoding strategies (parametric softmax based or pixel …
Masked autoencoders are scalable vision learners
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners
for computer vision. Our MAE approach is simple: we mask random patches of the input …
for computer vision. Our MAE approach is simple: we mask random patches of the input …
Curricular contrastive regularization for physics-aware single image dehazing
Considering the ill-posed nature, contrastive regularization has been developed for single
image dehazing, introducing the information from negative images as a lower bound …
image dehazing, introducing the information from negative images as a lower bound …
Learn from others and be yourself in heterogeneous federated learning
Federated learning has emerged as an important distributed learning paradigm, which
normally involves collaborative updating with others and local updating on private data …
normally involves collaborative updating with others and local updating on private data …