Emerging properties in self-supervised vision transformers
In this paper, we question if self-supervised learning provides new properties to Vision
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …
Robust fine-tuning of zero-shot models
Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of
data distributions when performing zero-shot inference (ie, without fine-tuning on a specific …
data distributions when performing zero-shot inference (ie, without fine-tuning on a specific …
Robust design optimization and emerging technologies for electrical machines: Challenges and open problems
The bio-inspired algorithms are novel, modern, and efficient tools for the design of electrical
machines. However, from the mathematical point of view, these problems belong to the most …
machines. However, from the mathematical point of view, these problems belong to the most …
Stochastic actor-oriented models for network dynamics
TAB Snijders - Annual review of statistics and its application, 2017 - annualreviews.org
This article discusses the stochastic actor-oriented model for analyzing panel data of
networks. The model is defined as a continuous-time Markov chain, observed at two or more …
networks. The model is defined as a continuous-time Markov chain, observed at two or more …
Optimization methods for large-scale machine learning
This paper provides a review and commentary on the past, present, and future of numerical
optimization algorithms in the context of machine learning applications. Through case …
optimization algorithms in the context of machine learning applications. Through case …
Averaging weights leads to wider optima and better generalization
Deep neural networks are typically trained by optimizing a loss function with an SGD variant,
in conjunction with a decaying learning rate, until convergence. We show that simple …
in conjunction with a decaying learning rate, until convergence. We show that simple …
Analyzing and improving the training dynamics of diffusion models
Diffusion models currently dominate the field of data-driven image synthesis with their
unparalleled scaling to large datasets. In this paper we identify and rectify several causes for …
unparalleled scaling to large datasets. In this paper we identify and rectify several causes for …
A simple baseline for bayesian uncertainty in deep learning
Abstract We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose
approach for uncertainty representation and calibration in deep learning. Stochastic Weight …
approach for uncertainty representation and calibration in deep learning. Stochastic Weight …
Lookahead optimizer: k steps forward, 1 step back
The vast majority of successful deep neural networks are trained using variants of stochastic
gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly …
gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly …
Sparsified SGD with memory
Huge scale machine learning problems are nowadays tackled by distributed optimization
algorithms, ie algorithms that leverage the compute power of many devices for training. The …
algorithms, ie algorithms that leverage the compute power of many devices for training. The …