A survey on self-supervised learning: Algorithms, applications, and future trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

A review of convolutional neural network architectures and their optimizations

S Cong, Y Zhou - Artificial Intelligence Review, 2023 - Springer
The research advances concerning the typical architectures of convolutional neural
networks (CNNs) as well as their optimizations are analyzed and elaborated in detail in this …

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

MOSE: A new dataset for video object segmentation in complex scenes

H Ding, C Liu, S He, X Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Video object segmentation (VOS) aims at segmenting a particular object throughout the
entire video clip sequence. The state-of-the-art VOS methods have achieved excellent …

Emerging properties in self-supervised vision transformers

M Caron, H Touvron, I Misra, H Jégou… - Proceedings of the …, 2021 - openaccess.thecvf.com
In this paper, we question if self-supervised learning provides new properties to Vision
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …

Tap-vid: A benchmark for tracking any point in a video

C Doersch, A Gupta, L Markeeva… - Advances in …, 2022 - proceedings.neurips.cc
Generic motion understanding from video involves not only tracking objects, but also
perceiving how their surfaces deform and move. This information is useful to make …

Kee** your eye on the ball: Trajectory attention in video transformers

M Patrick, D Campbell, Y Asano… - Advances in neural …, 2021 - proceedings.neurips.cc
In video transformers, the time dimension is often treated in the same way as the two spatial
dimensions. However, in a scene where objects or the camera may move, a physical point …

A generalist framework for panoptic segmentation of images and videos

T Chen, L Li, S Saxena, G Hinton… - Proceedings of the …, 2023 - openaccess.thecvf.com
Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image.
As permutations of instance IDs are also valid solutions, the task requires learning of high …

Particle video revisited: Tracking through occlusions using point trajectories

AW Harley, Z Fang, K Fragkiadaki - European Conference on Computer …, 2022 - Springer
Tracking pixels in videos is typically studied as an optical flow estimation problem, where
every pixel is described with a displacement vector that locates it in the next frame. Even …

Self-supervised co-training for video representation learning

T Han, W **e, A Zisserman - Advances in neural information …, 2020 - proceedings.neurips.cc
The objective of this paper is visual-only self-supervised video representation learning. We
make the following contributions:(i) we investigate the benefit of adding semantic-class …