Transformers in vision: A survey
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …
vision community to study their application to computer vision problems. Among their salient …
Differentiable rendering: A survey
Deep neural networks (DNNs) have shown remarkable performance improvements on
vision-related tasks such as object detection or image segmentation. Despite their success …
vision-related tasks such as object detection or image segmentation. Despite their success …
Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time
Accurate whole-body multi-person pose estimation and tracking is an important yet
challenging topic in computer vision. To capture the subtle actions of humans for complex …
challenging topic in computer vision. To capture the subtle actions of humans for complex …
Fastvit: A fast hybrid vision transformer using structural reparameterization
The recent amalgamation of transformer and convolutional designs has led to steady
improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a …
improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a …
Effective whole-body pose estimation with two-stages distillation
Whole-body pose estimation localizes the human body, hand, face, and foot keypoints in an
image. This task is challenging due to multi-scale body parts, fine-grained localization for …
image. This task is challenging due to multi-scale body parts, fine-grained localization for …
Tapir: Tracking any point with per-frame initialization and temporal refinement
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried
point on any physical surface throughout a video sequence. Our approach employs two …
point on any physical surface throughout a video sequence. Our approach employs two …
Reconstructing hands in 3d with transformers
We present an approach that can reconstruct hands in 3D from monocular input. Our
approach for Hand Mesh Recovery HaMeR follows a fully transformer-based architecture …
approach for Hand Mesh Recovery HaMeR follows a fully transformer-based architecture …
One-stage 3d whole-body mesh recovery with component aware transformer
Whole-body mesh recovery aims to estimate the 3D human body, face, and hands
parameters from a single image. It is challenging to perform this task with a single network …
parameters from a single image. It is challenging to perform this task with a single network …
Source-free domain adaptive human pose estimation
Abstract Human Pose Estimation (HPE) is widely used in various fields, including motion
analysis, healthcare, and virtual reality. However, the great expenses of labeled real-world …
analysis, healthcare, and virtual reality. However, the great expenses of labeled real-world …
End-to-end human pose and mesh reconstruction with transformers
We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human
pose and mesh vertices from a single image. Our method uses a transformer encoder to …
pose and mesh vertices from a single image. Our method uses a transformer encoder to …