Targeted supervised contrastive learning for long-tailed recognition
Real-world data often exhibits long tail distributions with heavy class imbalance, where the
majority classes can dominate the training process and alter the decision boundaries of the …
majority classes can dominate the training process and alter the decision boundaries of the …
Self-supervised video pretraining yields robust and more human-aligned visual representations
Humans learn powerful representations of objects and scenes by observing how they evolve
over time. Yet, outside of specific tasks that require explicit temporal understanding, static …
over time. Yet, outside of specific tasks that require explicit temporal understanding, static …
Locality-aware inter-and intra-video reconstruction for self-supervised correspondence learning
Our target is to learn visual correspondence from unlabeled videos. We develop LIIR, a
locality-aware inter-and intra-video reconstruction framework that fills in three missing …
locality-aware inter-and intra-video reconstruction framework that fills in three missing …
Self-supervised visual learning from interactions with objects
Self-supervised learning (SSL) has revolutionized visual representation learning, but has
not achieved the robustness of human vision. A reason for this could be that SSL does not …
not achieved the robustness of human vision. A reason for this could be that SSL does not …
Enhanced long-tailed recognition with contrastive cutmix augmentation
H Pan, Y Guo, M Yu, J Chen - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
Real-world data often follows a long-tailed distribution, where a few head classes occupy
most of the data and a large number of tail classes only contain very limited samples. In …
most of the data and a large number of tail classes only contain very limited samples. In …
Vic-mae: Self-supervised representation learning from images and video with contrastive masked autoencoders
We propose ViC-MAE, a model that combines both Masked AutoEncoders (MAE) and
contrastive learning. ViC-MAE is trained using a global representation obtained by pooling …
contrastive learning. ViC-MAE is trained using a global representation obtained by pooling …
Self-supervised video pretraining yields human-aligned visual representations
Humans learn powerful representations of objects and scenes by observing how they evolve
over time. Yet, outside of specific tasks that require explicit temporal understanding, static …
over time. Yet, outside of specific tasks that require explicit temporal understanding, static …
Contextual Augmented Global Contrast for Multimodal Intent Recognition
Multimodal intent recognition (MIR) aims to perceive the human intent polarity via language
visual and acoustic modalities. The inherent intent ambiguity makes it challenging to …
visual and acoustic modalities. The inherent intent ambiguity makes it challenging to …
Self-supervised learning for rolling shutter temporal super-resolution
Most cameras on portable devices adopt a rolling shutter (RS) mechanism, encoding
sufficient temporal dynamic information through sequential readouts. This advantage can be …
sufficient temporal dynamic information through sequential readouts. This advantage can be …
Contrastive learning of person-independent representations for facial action unit detection
Facial action unit (AU) detection, aiming to classify AU present in the facial image, has long
suffered from insufficient AU annotations. In this paper, we aim to mitigate this data scarcity …
suffered from insufficient AU annotations. In this paper, we aim to mitigate this data scarcity …