Anticipative video transformer

R Girdhar, K Grauman - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Abstract We propose Anticipative Video Transformer (AVT), an end-to-end attention-based
video modeling architecture that attends to the previously observed video in order to …

Self-supervised visual feature learning with deep neural networks: A survey

L **g, Y Tian - IEEE transactions on pattern analysis and …, 2020 - ieeexplore.ieee.org
Large-scale labeled data are generally required to train deep neural networks in order to
obtain better performance in visual feature learning from images or videos for computer …

A review of predictive and contrastive self-supervised learning for medical images

WC Wang, E Ahn, D Feng, J Kim - Machine Intelligence Research, 2023 - Springer
Over the last decade, supervised deep learning on manually annotated big data has been
progressing significantly on computer vision tasks. But, the application of deep learning in …

Self-supervised learning by cross-modal audio-video clustering

H Alwassel, D Mahajan, B Korbar… - Advances in …, 2020 - proceedings.neurips.cc
Visual and audio modalities are highly correlated, yet they contain different information.
Their strong correlation makes it possible to predict the semantics of one from the other with …

Self-supervised learning for medical image analysis using image context restoration

L Chen, P Bentley, K Mori, K Misawa, M Fujiwara… - Medical image …, 2019 - Elsevier
Abstract Machine learning, particularly deep learning has boosted medical image analysis
over the past years. Training a good model based on deep learning requires large amount …

Memory-augmented dense predictive coding for video representation learning

T Han, W **e, A Zisserman - European conference on computer vision, 2020 - Springer
The objective of this paper is self-supervised learning from video, in particular for
representations for action recognition. We make the following contributions:(i) We propose a …

Video representation learning by dense predictive coding

T Han, W **e, A Zisserman - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
The objective of this paper is self-supervised learning of spatio-temporal embeddings from
video, suitable for human action recognition. We make three contributions: First, we …

Slow down to go better: A survey on slow feature analysis

P Song, C Zhao - IEEE Transactions on Neural Networks and …, 2022 - ieeexplore.ieee.org
Temporal data contain a wealth of valuable information, playing an essential role in various
machine-learning tasks. Slow feature analysis (SFA), one of the most classic temporal …

Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection

JX Zhong, N Li, W Kong, S Liu… - Proceedings of the …, 2019 - openaccess.thecvf.com
Video anomaly detection under weak labels is formulated as a typical multiple-instance
learning problem in previous works. In this paper, we provide a new perspective, ie, a …

Segmenting objects from relational visual data

X Lu, W Wang, J Shen, DJ Crandall… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
In this article, we model a set of pixelwise object segmentation tasks—automatic video
segmentation (AVS), image co-segmentation (ICS) and few-shot semantic segmentation …