A survey on self-supervised learning: Algorithms, applications, and future trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

Vision-based human pose estimation via deep learning: A survey

G Lan, Y Wu, F Hu, Q Hao - IEEE Transactions on Human …, 2022 - ieeexplore.ieee.org
Human pose estimation (HPE) has attracted a significant amount of attention from the
computer vision community in the past decades. Moreover, HPE has been applied to various …

Humans in 4D: Reconstructing and tracking humans with transformers

S Goel, G Pavlakos, J Rajasegaran… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present an approach to reconstruct humans and track them over time. At the core of our
approach, we propose a fully" transformerized" version of a network for human mesh …

Reconstructing hands in 3d with transformers

G Pavlakos, D Shan, I Radosavovic… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present an approach that can reconstruct hands in 3D from monocular input. Our
approach for Hand Mesh Recovery HaMeR follows a fully transformer-based architecture …

Advancing plain vision transformer toward remote sensing foundation model

D Wang, Q Zhang, Y Xu, J Zhang, B Du… - … on Geoscience and …, 2022 - ieeexplore.ieee.org
Large-scale vision foundation models have made significant progress in visual tasks on
natural images, with vision transformers (ViTs) being the primary choice due to their good …

Motion-x: A large-scale 3d expressive whole-body human motion dataset

J Lin, A Zeng, S Lu, Y Cai, R Zhang… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we present Motion-X, a large-scale 3D expressive whole-body motion dataset.
Existing motion datasets predominantly contain body-only poses, lacking facial expressions …

One-stage 3d whole-body mesh recovery with component aware transformer

J Lin, A Zeng, H Wang, L Zhang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Whole-body mesh recovery aims to estimate the 3D human body, face, and hands
parameters from a single image. It is challenging to perform this task with a single network …

Wham: Reconstructing world-grounded humans with accurate 3d motion

S Shin, J Kim, E Halilaj… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The estimation of 3D human motion from video has progressed rapidly but current methods
still have several key limitations. First most methods estimate the human in camera …

Decoupling human and camera motion from videos in the wild

V Ye, G Pavlakos, J Malik… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose a method to reconstruct global human trajectories from videos in the wild. Our
optimization method decouples the camera and human motion, which allows us to place …

Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond

Q Zhang, Y Xu, J Zhang, D Tao - International Journal of Computer Vision, 2023 - Springer
Vision transformers have shown great potential in various computer vision tasks owing to
their strong capability to model long-range dependency using the self-attention mechanism …