A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

End-to-end human pose and mesh reconstruction with transformers

K Lin, L Wang, Z Liu - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human
pose and mesh vertices from a single image. Our method uses a transformer encoder to …

Transvg: End-to-end visual grounding with transformers

J Deng, Z Yang, T Chen, W Zhou… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
In this paper, we present a neat yet effective transformer-based framework for visual
grounding, namely TransVG, to address the task of grounding a language query to the …

HiFT: Hierarchical feature transformer for aerial tracking

Z Cao, C Fu, J Ye, B Li, Y Li - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Most existing Siamese-based tracking methods execute the classification and regression of
the target object based on the similarity maps. However, they either employ a single map …

Survey on depth and RGB image-based 3D hand shape and pose estimation

L Huang, B Zhang, Z Guo, Y **ao, Z Cao… - Virtual Reality & Intelligent …, 2021 - Elsevier
The field of vision-based human hand three-dimensional (3D) shape and pose estimation
has attracted significant attention recently owing to its key role in various applications, such …

Transpose: Keypoint localization via transformer

S Yang, Z Quan, M Nie, W Yang - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
While CNN-based models have made remarkable progress on human pose estimation,
what spatial dependencies they capture to localize keypoints remains unclear. In this work …

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arxiv preprint arxiv …, 2020 - arxiv.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

E^ 2vpt: An effective and efficient approach for visual prompt tuning

C Han, Q Wang, Y Cui, Z Cao, W Wang, S Qi… - arxiv preprint arxiv …, 2023 - arxiv.org
As the size of transformer-based models continues to grow, fine-tuning these large-scale
pretrained vision models for new tasks has become increasingly parameter-intensive …

Handoccnet: Occlusion-robust 3d hand mesh estimation network

JK Park, Y Oh, G Moon, H Choi… - Proceedings of the …, 2022 - openaccess.thecvf.com
Hands are often severely occluded by objects, which makes 3D hand mesh estimation
challenging. Previous works often have disregarded information at occluded regions …

Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization

Z Yuan, C Xue, Y Chen, Q Wu, G Sun - European conference on computer …, 2022 - Springer
Quantization is one of the most effective methods to compress neural networks, which has
achieved great success on convolutional neural networks (CNNs). Recently, vision …