- Academic Search

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Save Cite Cited by 2930 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Deep learning-based human pose estimation: A survey

C Zheng, W Wu, C Chen, T Yang, S Zhu, J Shen… - ACM Computing …, 2023 - dl.acm.org

Human pose estimation aims to locate the human body parts and build human body
representation (eg, body skeleton) from input data such as images and videos. It has drawn …

Save Cite Cited by 582 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] baai.ac.cn

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Save Cite Cited by 2678 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Multiscale vision transformers

H Fan, B **ong, K Mangalam, Y Li… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract We present Multiscale Vision Transformers (MViT) for video and image recognition,
by connecting the seminal idea of multiscale feature hierarchies with transformer models …

Save Cite Cited by 1542 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Save Cite Cited by 629 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Humans in 4D: Reconstructing and tracking humans with transformers

S Goel, G Pavlakos, J Rajasegaran… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present an approach to reconstruct humans and track them over time. At the core of our
approach, we propose a fully" transformerized" version of a network for human mesh …

Save Cite Cited by 173 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

3d human pose estimation with spatial and temporal transformers

C Zheng, S Zhu, M Mendieta, T Yang… - Proceedings of the …, 2021 - openaccess.thecvf.com

Transformer architectures have become the model of choice in natural language processing
and are now being introduced into computer vision tasks such as image classification, object …

Save Cite Cited by 597 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Mhformer: Multi-hypothesis transformer for 3d human pose estimation

W Li, H Liu, H Tang, P Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Estimating 3D human poses from monocular videos is a challenging task due to depth
ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting …

Save Cite Cited by 371 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Cliff: Carrying location information in full frames into human pose and shape estimation

Z Li, J Liu, Z Zhang, S Xu, Y Yan - European Conference on Computer …, 2022 - Springer

Top-down methods dominate the field of 3D human pose and shape estimation, because
they are decoupled from human detection and allow researchers to focus on the core …

Save Cite Cited by 237 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video

J Zhang, Z Tu, J Yang, Y Chen… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Recent transformer-based solutions have been introduced to estimate 3D human pose from
2D keypoint sequence by considering body joints among all frames globally to learn spatio …

Save Cite Cited by 291 Related articles All 8 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

End-to-end human pose and mesh reconstruction with transformers

Transformers in vision: A survey

Deep learning-based human pose estimation: A survey

A survey on vision transformer

Multiscale vision transformers

Multimodal learning with transformers: A survey

Humans in 4D: Reconstructing and tracking humans with transformers

3d human pose estimation with spatial and temporal transformers

Mhformer: Multi-hypothesis transformer for 3d human pose estimation

Cliff: Carrying location information in full frames into human pose and shape estimation

Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video