Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Deep learning-based human pose estimation: A survey

C Zheng, W Wu, C Chen, T Yang, S Zhu, J Shen… - ACM Computing …, 2023 - dl.acm.org
Human pose estimation aims to locate the human body parts and build human body
representation (eg, body skeleton) from input data such as images and videos. It has drawn …

Motiondiffuse: Text-driven human motion generation with diffusion model

M Zhang, Z Cai, L Pan, F Hong, X Guo, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
Human motion modeling is important for many modern graphics applications, which typically
require professional skills. In order to remove the skill barriers for laymen, recent motion …

Humans in 4D: Reconstructing and tracking humans with transformers

S Goel, G Pavlakos, J Rajasegaran… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present an approach to reconstruct humans and track them over time. At the core of our
approach, we propose a fully" transformerized" version of a network for human mesh …

Generating diverse and natural 3d human motions from text

C Guo, S Zou, X Zuo, S Wang, W Ji… - Proceedings of the …, 2022 - openaccess.thecvf.com
Automated generation of 3D human motions from text is a challenging problem. The
generated motions are expected to be sufficiently diverse to explore the text-grounded …

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Y Tian, L Fan, P Isola, H Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …

Avatarclip: Zero-shot text-driven generation and animation of 3d avatars

F Hong, M Zhang, L Pan, Z Cai, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
3D avatar creation plays a crucial role in the digital age. However, the whole production
process is prohibitively time-consuming and labor-intensive. To democratize this technology …

TEMOS: Generating Diverse Human Motions from Textual Descriptions

M Petrovich, MJ Black, G Varol - European Conference on Computer …, 2022 - Springer
We address the problem of generating diverse 3D human motions from textual descriptions.
This challenging task requires joint modeling of both modalities: understanding and …

Bedlam: A synthetic dataset of bodies exhibiting detailed lifelike animated motion

MJ Black, P Patel, J Tesch… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We show, for the first time, that neural networks trained only on synthetic data achieve state-
of-the-art accuracy on the problem of 3D human pose and shape (HPS) estimation from real …

Humanrf: High-fidelity neural radiance fields for humans in motion

M Işık, M Rünz, M Georgopoulos, T Khakhulin… - ACM Transactions on …, 2023 - dl.acm.org
Representing human performance at high-fidelity is an essential building block in diverse
applications, such as film production, computer games or videoconferencing. To close the …