Hyperbolic contrastive learning for visual representations beyond objects

S Ge, S Mishra, S Kornblith, CL Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Although self-/un-supervised methods have led to rapid progress in visual representation
learning, these methods generally treat objects and scenes using the same lens. In this …

In or out? fixing imagenet out-of-distribution detection evaluation

J Bitterwolf, M Mueller, M Hein - arxiv preprint arxiv:2306.00826, 2023 - arxiv.org
Out-of-distribution (OOD) detection is the problem of identifying inputs which are unrelated to
the in-distribution task. The OOD detection performance when the in-distribution (ID) is …

Genecis: A benchmark for general conditional image similarity

S Vaze, N Carion, I Misra - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
We argue that there are many notions of'similarity'and that models, like humans, should be
able to adapt to these dynamically. This contrasts with most representation learning …

Distilling model failures as directions in latent space

S Jain, H Lawrence, A Moitra, A Madry - arxiv preprint arxiv:2206.14754, 2022 - arxiv.org
Existing methods for isolating hard subpopulations and spurious correlations in datasets
often require human intervention. This can make these methods labor-intensive and dataset …

Effective human-AI teams via learned natural language rules and onboarding

H Mozannar, J Lee, D Wei, P Sattigeri… - Advances in …, 2024 - proceedings.neurips.cc
People are relying on AI agents to assist them with various tasks. The human must know
when to rely on the agent, collaborate with the agent, or ignore its suggestions. In this work …

The song describer dataset: a corpus of audio captions for music-and-language evaluation

I Manco, B Weck, S Doh, M Won, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality
audio-caption pairs, designed for the evaluation of music-and-language models. The …

Analyzing dataset annotation quality management in the wild

JC Klie, RE Castilho, I Gurevych - Computational Linguistics, 2024 - direct.mit.edu
Data quality is crucial for training accurate, unbiased, and trustworthy machine learning
models as well as for their correct evaluation. Recent work, however, has shown that even …

Automated classification of model errors on imagenet

M Peychev, M Müller, M Fischer… - Advances in Neural …, 2024 - proceedings.neurips.cc
While the ImageNet dataset has been driving computer vision research over the past
decade, significant label noise and ambiguity have made top-1 accuracy an insufficient …

Understanding the detrimental class-level effects of data augmentation

P Kirichenko, M Ibrahim, R Balestriero… - Advances in …, 2024 - proceedings.neurips.cc
Data augmentation (DA) encodes invariance and provides implicit regularization critical to a
model's performance in image classification tasks. However, while DA improves average …

Understanding and mitigating the label noise in pre-training on downstream tasks

H Chen, J Wang, A Shah, R Tao, H Wei, X **e… - arxiv preprint arxiv …, 2023 - arxiv.org
Pre-training on large-scale datasets and then fine-tuning on downstream tasks have
become a standard practice in deep learning. However, pre-training data often contain label …