ibot: Image bert pre-training with online tokenizer

J Zhou, C Wei, H Wang, W Shen, C **e, A Yuille… - arxiv preprint arxiv …, 2021‏ - arxiv.org
The success of language Transformers is primarily attributed to the pretext task of masked
language modeling (MLM), where texts are first tokenized into semantically meaningful …

Delving into out-of-distribution detection with vision-language representations

Y Ming, Z Cai, J Gu, Y Sun, W Li… - Advances in neural …, 2022‏ - proceedings.neurips.cc
Recognizing out-of-distribution (OOD) samples is critical for machine learning systems
deployed in the open world. The vast majority of OOD detection methods are driven by a …

Last layer re-training is sufficient for robustness to spurious correlations

P Kirichenko, P Izmailov, AG Wilson - arxiv preprint arxiv:2204.02937, 2022‏ - arxiv.org
Neural network classifiers can largely rely on simple spurious features, such as
backgrounds, to make predictions. However, even in these cases, we show that they still …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021‏ - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Rethinking spatial dimensions of vision transformers

B Heo, S Yun, D Han, S Chun… - Proceedings of the …, 2021‏ - openaccess.thecvf.com
Abstract Vision Transformer (ViT) extends the application range of transformers from
language processing to computer vision tasks as being an alternative architecture against …

Causal machine learning: A survey and open problems

J Kaddour, A Lynch, Q Liu, MJ Kusner… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Causal Machine Learning (CausalML) is an umbrella term for machine learning methods
that formalize the data-generation process as a structural causal model (SCM). This …

On feature learning in the presence of spurious correlations

P Izmailov, P Kirichenko, N Gruver… - Advances in Neural …, 2022‏ - proceedings.neurips.cc
Deep classifiers are known to rely on spurious features—patterns which are correlated with
the target on the training data but not inherently relevant to the learning problem, such as the …

Swad: Domain generalization by seeking flat minima

J Cha, S Chun, K Lee, HC Cho… - Advances in Neural …, 2021‏ - proceedings.neurips.cc
Abstract Domain generalization (DG) methods aim to achieve generalizability to an unseen
target domain by using only training data from the source domains. Although a variety of DG …

A fine-grained analysis on distribution shift

O Wiles, S Gowal, F Stimberg, S Alvise-Rebuffi… - arxiv preprint arxiv …, 2021‏ - arxiv.org
Robustness to distribution shifts is critical for deploying machine learning models in the real
world. Despite this necessity, there has been little work in defining the underlying …

Change is hard: A closer look at subpopulation shift

Y Yang, H Zhang, D Katabi, M Ghassemi - arxiv preprint arxiv:2302.12254, 2023‏ - arxiv.org
Machine learning models often perform poorly on subgroups that are underrepresented in
the training data. Yet, little is understood on the variation in mechanisms that cause …