Targeted supervised contrastive learning for long-tailed recognition

T Li, P Cao, Y Yuan, L Fan, Y Yang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Real-world data often exhibits long tail distributions with heavy class imbalance, where the
majority classes can dominate the training process and alter the decision boundaries of the …

Self-supervised video pretraining yields robust and more human-aligned visual representations

N Parthasarathy, SM Eslami… - Advances in Neural …, 2023 - proceedings.neurips.cc
Humans learn powerful representations of objects and scenes by observing how they evolve
over time. Yet, outside of specific tasks that require explicit temporal understanding, static …

Locality-aware inter-and intra-video reconstruction for self-supervised correspondence learning

L Li, T Zhou, W Wang, L Yang, J Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Our target is to learn visual correspondence from unlabeled videos. We develop LIIR, a
locality-aware inter-and intra-video reconstruction framework that fills in three missing …

Self-supervised visual learning from interactions with objects

A Aubret, C Teulière, J Triesch - European Conference on Computer …, 2024 - Springer
Self-supervised learning (SSL) has revolutionized visual representation learning, but has
not achieved the robustness of human vision. A reason for this could be that SSL does not …

Enhanced long-tailed recognition with contrastive cutmix augmentation

H Pan, Y Guo, M Yu, J Chen - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
Real-world data often follows a long-tailed distribution, where a few head classes occupy
most of the data and a large number of tail classes only contain very limited samples. In …

Vic-mae: Self-supervised representation learning from images and video with contrastive masked autoencoders

J Hernandez, R Villegas, V Ordonez - European Conference on Computer …, 2024 - Springer
We propose ViC-MAE, a model that combines both Masked AutoEncoders (MAE) and
contrastive learning. ViC-MAE is trained using a global representation obtained by pooling …

Self-supervised video pretraining yields human-aligned visual representations

N Parthasarathy, SM Eslami, J Carreira… - arxiv preprint arxiv …, 2022 - arxiv.org
Humans learn powerful representations of objects and scenes by observing how they evolve
over time. Yet, outside of specific tasks that require explicit temporal understanding, static …

Contextual Augmented Global Contrast for Multimodal Intent Recognition

K Sun, Z **e, M Ye, H Zhang - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Multimodal intent recognition (MIR) aims to perceive the human intent polarity via language
visual and acoustic modalities. The inherent intent ambiguity makes it challenging to …

Self-supervised learning for rolling shutter temporal super-resolution

B Fan, Y Guo, Y Dai, C Xu, B Shi - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Most cameras on portable devices adopt a rolling shutter (RS) mechanism, encoding
sufficient temporal dynamic information through sequential readouts. This advantage can be …

Contrastive learning of person-independent representations for facial action unit detection

Y Li, S Shan - IEEE Transactions on Image Processing, 2023 - ieeexplore.ieee.org
Facial action unit (AU) detection, aiming to classify AU present in the facial image, has long
suffered from insufficient AU annotations. In this paper, we aim to mitigate this data scarcity …