The mechanism of prediction head in non-contrastive self-supervised learning

Z Wen, Y Li - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
The surprising discovery of the BYOL method shows the negative samples can be replaced
by adding the prediction head to the network. It is mysterious why even when there exist …

Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond

O Chehab, A Hyvarinen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent research has developed several Monte Carlo methods for estimating the
normalization constant (partition function) based on the idea of annealing. This means …

Estimating the density ratio between distributions with high discrepancy using multinomial logistic regression

A Srivastava, S Han, K Xu, B Rhodes… - arxiv preprint arxiv …, 2023 - arxiv.org
Functions of the ratio of the densities $ p/q $ are widely used in machine learning to quantify
the discrepancy between the two distributions $ p $ and $ q $. For high-dimensional …

Revisiting energy based models as policies: Ranking noise contrastive estimation and interpolating energy models

S Singh, S Tu, V Sindhwani - arxiv preprint arxiv:2309.05803, 2023 - arxiv.org
A crucial design decision for any robot learning pipeline is the choice of policy
representation: what type of model should be used to generate the next set of robot actions …

InfoNCE: Identifying the Gap Between Theory and Practice

E Rusak, P Reizinger, A Juhos, O Bringmann… - arxiv preprint arxiv …, 2024 - arxiv.org
Previous theoretical work on contrastive learning (CL) with InfoNCE showed that, under
certain assumptions, the learned representations uncover the ground-truth latent factors. We …

Latent energy-based odyssey: Black-box optimization via expanded exploration in the energy-based latent space

P Yu, D Zhang, H He, X Ma, R Miao, Y Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the
knowledge from a pre-collected offline dataset of function values and corresponding input …

Learning unnormalized statistical models via compositional optimization

W Jiang, J Qin, L Wu, C Chen… - … on Machine Learning, 2023 - proceedings.mlr.press
Learning unnormalized statistical models (eg, energy-based models) is computationally
challenging due to the complexity of handling the partition function. To eschew this …

Statistical applications of contrastive learning

MU Gutmann, S Kleinegesse, B Rhodes - Behaviormetrika, 2022 - Springer
The likelihood function plays a crucial role in statistical inference and experimental design.
However, it is computationally intractable for several important classes of statistical models …

Pitfalls of gaussians as a noise distribution in NCE

H Lee, C Pabbaraju, A Sevekari, A Risteski - arxiv preprint arxiv …, 2022 - arxiv.org
Noise Contrastive Estimation (NCE) is a popular approach for learning probability density
functions parameterized up to a constant of proportionality. The main idea is to design a …

Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation

O Chehab, A Gramfort, A Hyvarinen - arxiv preprint arxiv:2301.09696, 2023 - arxiv.org
Self-supervised learning is an increasingly popular approach to unsupervised learning,
achieving state-of-the-art results. A prevalent approach consists in contrasting data points …