A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions

A Barua, MU Ahmed, S Begum - Ieee access, 2023 - ieeexplore.ieee.org
Multimodal machine learning (MML) is a tempting multidisciplinary research area where
heterogeneous data from multiple modalities and machine learning (ML) are combined to …

A review of predictive and contrastive self-supervised learning for medical images

WC Wang, E Ahn, D Feng, J Kim - Machine Intelligence Research, 2023 - Springer
Over the last decade, supervised deep learning on manually annotated big data has been
progressing significantly on computer vision tasks. But, the application of deep learning in …

With a little help from my friends: Nearest-neighbor contrastive learning of visual representations

D Dwibedi, Y Aytar, J Tompson… - Proceedings of the …, 2021 - openaccess.thecvf.com
Self-supervised learning algorithms based on instance discrimination train encoders to be
invariant to pre-defined transformations of the same instance. While most methods treat …

Mimicplay: Long-horizon imitation learning by watching human play

C Wang, L Fan, J Sun, R Zhang, L Fei-Fei, D Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Imitation learning from human demonstrations is a promising paradigm for teaching robots
manipulation skills in the real world. However, learning complex long-horizon tasks often …

Spatiotemporal contrastive video representation learning

R Qian, T Meng, B Gong, MH Yang… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present a self-supervised Contrastive Video Representation Learning (CVRL) method to
learn spatiotemporal visual representations from unlabeled videos. Our representations are …

Self-supervised learning of pretext-invariant representations

I Misra, L Maaten - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
The goal of self-supervised learning from images is to construct image representations that
are semantically meaningful via pretext tasks that do not require semantic annotations. Many …

Skeletonmae: graph-based masked autoencoder for skeleton sequence pre-training

H Yan, Y Liu, Y Wei, Z Li, G Li… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Skeleton sequence representation learning has shown great advantages for action
recognition due to its promising ability to model human joints and topology. However, the …

Videomoco: Contrastive video representation learning with temporally adversarial examples

T Pan, Y Song, T Yang, W Jiang… - Proceedings of the …, 2021 - openaccess.thecvf.com
MoCo is effective for unsupervised image representation learning. In this paper, we propose
VideoMoCo for unsupervised video representation learning. Given a video sequence as an …

Human-to-robot imitation in the wild

S Bahl, A Gupta, D Pathak - arxiv preprint arxiv:2207.09450, 2022 - arxiv.org
We approach the problem of learning by watching humans in the wild. While traditional
approaches in Imitation and Reinforcement Learning are promising for learning in the real …

Efficient training of visual transformers with small datasets

Y Liu, E Sangineto, W Bi, N Sebe… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Visual Transformers (VTs) are emerging as an architectural paradigm alternative to
Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations …