Biformer: Vision transformer with bi-level routing attention

L Zhu, X Wang, Z Ke, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …

Chat-univi: Unified visual representation empowers large language models with image and video understanding

P **, R Takanobu, W Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large language models have demonstrated impressive universal capabilities across a wide
range of open-ended tasks and have extended their utility to encompass multimodal …

Effective whole-body pose estimation with two-stages distillation

Z Yang, A Zeng, C Yuan, Y Li - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Whole-body pose estimation localizes the human body, hand, face, and foot keypoints in an
image. This task is challenging due to multi-scale body parts, fine-grained localization for …

Beyond appearance: a semantic controllable self-supervised learning framework for human-centric visual tasks

W Chen, X Xu, J Jia, H Luo, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human-centric visual tasks have attracted increasing research attention due to their
widespread applications. In this paper, we aim to learn a general human representation from …

Dynamic neural network structure: A review for its theories and applications

J Guo, CLP Chen, Z Liu, X Yang - IEEE Transactions on Neural …, 2024 - ieeexplore.ieee.org
The dynamic neural network (DNN), in contrast to the static counterpart, offers numerous
advantages, such as improved accuracy, efficiency, and interpretability. These benefits stem …

Joint token pruning and squeezing towards more aggressive compression of vision transformers

S Wei, T Ye, S Zhang, Y Tang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Although vision transformers (ViTs) have shown promising results in various computer vision
tasks recently, their high computational cost limits their practical applications. Previous …

Hourglass tokenizer for efficient transformer-based 3D human pose estimation

W Li, M Liu, H Liu, P Wang, J Cai… - Proceedings of the …, 2024 - openaccess.thecvf.com
Transformers have been successfully applied in the field of video-based 3D human pose
estimation. However the high computational costs of these video pose transformers (VPTs) …