Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Differentiable rendering: A survey

H Kato, D Beker, M Morariu, T Ando… - arxiv preprint arxiv …, 2020 - arxiv.org
Deep neural networks (DNNs) have shown remarkable performance improvements on
vision-related tasks such as object detection or image segmentation. Despite their success …

Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time

HS Fang, J Li, H Tang, C Xu, H Zhu… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Accurate whole-body multi-person pose estimation and tracking is an important yet
challenging topic in computer vision. To capture the subtle actions of humans for complex …

Fastvit: A fast hybrid vision transformer using structural reparameterization

PKA Vasu, J Gabriel, J Zhu, O Tuzel… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent amalgamation of transformer and convolutional designs has led to steady
improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a …

Effective whole-body pose estimation with two-stages distillation

Z Yang, A Zeng, C Yuan, Y Li - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Whole-body pose estimation localizes the human body, hand, face, and foot keypoints in an
image. This task is challenging due to multi-scale body parts, fine-grained localization for …

Tapir: Tracking any point with per-frame initialization and temporal refinement

C Doersch, Y Yang, M Vecerik… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried
point on any physical surface throughout a video sequence. Our approach employs two …

Reconstructing hands in 3d with transformers

G Pavlakos, D Shan, I Radosavovic… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present an approach that can reconstruct hands in 3D from monocular input. Our
approach for Hand Mesh Recovery HaMeR follows a fully transformer-based architecture …

One-stage 3d whole-body mesh recovery with component aware transformer

J Lin, A Zeng, H Wang, L Zhang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Whole-body mesh recovery aims to estimate the 3D human body, face, and hands
parameters from a single image. It is challenging to perform this task with a single network …

Source-free domain adaptive human pose estimation

Q Peng, C Zheng, C Chen - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Abstract Human Pose Estimation (HPE) is widely used in various fields, including motion
analysis, healthcare, and virtual reality. However, the great expenses of labeled real-world …

End-to-end human pose and mesh reconstruction with transformers

K Lin, L Wang, Z Liu - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human
pose and mesh vertices from a single image. Our method uses a transformer encoder to …