Emerging properties in self-supervised vision transformers

M Caron, H Touvron, I Misra, H Jégou… - Proceedings of the …, 2021 - openaccess.thecvf.com
In this paper, we question if self-supervised learning provides new properties to Vision
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …

Robust fine-tuning of zero-shot models

M Wortsman, G Ilharco, JW Kim, M Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of
data distributions when performing zero-shot inference (ie, without fine-tuning on a specific …

Robust design optimization and emerging technologies for electrical machines: Challenges and open problems

T Orosz, A Rassõlkin, A Kallaste, P Arsénio, D Pánek… - Applied Sciences, 2020 - mdpi.com
The bio-inspired algorithms are novel, modern, and efficient tools for the design of electrical
machines. However, from the mathematical point of view, these problems belong to the most …

Stochastic actor-oriented models for network dynamics

TAB Snijders - Annual review of statistics and its application, 2017 - annualreviews.org
This article discusses the stochastic actor-oriented model for analyzing panel data of
networks. The model is defined as a continuous-time Markov chain, observed at two or more …

Optimization methods for large-scale machine learning

L Bottou, FE Curtis, J Nocedal - SIAM review, 2018 - SIAM
This paper provides a review and commentary on the past, present, and future of numerical
optimization algorithms in the context of machine learning applications. Through case …

Averaging weights leads to wider optima and better generalization

P Izmailov, D Podoprikhin, T Garipov, D Vetrov… - arxiv preprint arxiv …, 2018 - arxiv.org
Deep neural networks are typically trained by optimizing a loss function with an SGD variant,
in conjunction with a decaying learning rate, until convergence. We show that simple …

Analyzing and improving the training dynamics of diffusion models

T Karras, M Aittala, J Lehtinen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models currently dominate the field of data-driven image synthesis with their
unparalleled scaling to large datasets. In this paper we identify and rectify several causes for …

A simple baseline for bayesian uncertainty in deep learning

WJ Maddox, P Izmailov, T Garipov… - Advances in neural …, 2019 - proceedings.neurips.cc
Abstract We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose
approach for uncertainty representation and calibration in deep learning. Stochastic Weight …

Lookahead optimizer: k steps forward, 1 step back

M Zhang, J Lucas, J Ba… - Advances in neural …, 2019 - proceedings.neurips.cc
The vast majority of successful deep neural networks are trained using variants of stochastic
gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly …

Sparsified SGD with memory

SU Stich, JB Cordonnier… - Advances in neural …, 2018 - proceedings.neurips.cc
Huge scale machine learning problems are nowadays tackled by distributed optimization
algorithms, ie algorithms that leverage the compute power of many devices for training. The …