The mechanism of prediction head in non-contrastive self-supervised learning
The surprising discovery of the BYOL method shows the negative samples can be replaced
by adding the prediction head to the network. It is mysterious why even when there exist …
by adding the prediction head to the network. It is mysterious why even when there exist …
Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond
Recent research has developed several Monte Carlo methods for estimating the
normalization constant (partition function) based on the idea of annealing. This means …
normalization constant (partition function) based on the idea of annealing. This means …
Estimating the density ratio between distributions with high discrepancy using multinomial logistic regression
Functions of the ratio of the densities $ p/q $ are widely used in machine learning to quantify
the discrepancy between the two distributions $ p $ and $ q $. For high-dimensional …
the discrepancy between the two distributions $ p $ and $ q $. For high-dimensional …
Revisiting energy based models as policies: Ranking noise contrastive estimation and interpolating energy models
A crucial design decision for any robot learning pipeline is the choice of policy
representation: what type of model should be used to generate the next set of robot actions …
representation: what type of model should be used to generate the next set of robot actions …
InfoNCE: Identifying the Gap Between Theory and Practice
Previous theoretical work on contrastive learning (CL) with InfoNCE showed that, under
certain assumptions, the learned representations uncover the ground-truth latent factors. We …
certain assumptions, the learned representations uncover the ground-truth latent factors. We …
Latent energy-based odyssey: Black-box optimization via expanded exploration in the energy-based latent space
Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the
knowledge from a pre-collected offline dataset of function values and corresponding input …
knowledge from a pre-collected offline dataset of function values and corresponding input …
Learning unnormalized statistical models via compositional optimization
Learning unnormalized statistical models (eg, energy-based models) is computationally
challenging due to the complexity of handling the partition function. To eschew this …
challenging due to the complexity of handling the partition function. To eschew this …
Statistical applications of contrastive learning
The likelihood function plays a crucial role in statistical inference and experimental design.
However, it is computationally intractable for several important classes of statistical models …
However, it is computationally intractable for several important classes of statistical models …
Pitfalls of gaussians as a noise distribution in NCE
Noise Contrastive Estimation (NCE) is a popular approach for learning probability density
functions parameterized up to a constant of proportionality. The main idea is to design a …
functions parameterized up to a constant of proportionality. The main idea is to design a …
Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation
Self-supervised learning is an increasingly popular approach to unsupervised learning,
achieving state-of-the-art results. A prevalent approach consists in contrasting data points …
achieving state-of-the-art results. A prevalent approach consists in contrasting data points …