Grounded sam: Assembling open-world models for diverse visual tasks

T Ren, S Liu, A Zeng, J Lin, K Li, H Cao, J Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to
combine with the segment anything model (SAM). This integration enables the detection and …

Skeletonmae: graph-based masked autoencoder for skeleton sequence pre-training

H Yan, Y Liu, Y Wei, Z Li, G Li… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Skeleton sequence representation learning has shown great advantages for action
recognition due to its promising ability to model human joints and topology. However, the …

Deep learning technique for human parsing: A survey and outlook

L Yang, W Jia, S Li, Q Song - International Journal of Computer Vision, 2024 - Springer
Human parsing aims to partition humans in image or video into multiple pixel-level semantic
parts. In the last decade, it has gained significantly increased interest in the computer vision …

Humanmac: Masked motion completion for human motion prediction

LH Chen, J Zhang, Y Li, Y Pang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human motion prediction is a classical problem in computer vision and computer graphics,
which has a wide range of practical applications. Previous effects achieve great empirical …

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

J Yang, X Niu, N Jiang, R Zhang, S Huang - European Conference on …, 2024 - Springer
Existing 3D human object interaction (HOI) datasets and models simply align global
descriptions with the long HOI sequence, while lacking a detailed understanding of …

Unified Human-centric Model, Framework and Benchmark: A Survey

X Zhao, S Sulaiman, WY Leng - IEEE Access, 2024 - ieeexplore.ieee.org
Human-centric Computer Vision Tasks (HCTs) refer to a series of tasks related to the human
body, such as Human Pose Estimation, Pedestrian Tracking, Re-Identification (ReID) …

X-pose: Detecting any keypoints

J Yang, A Zeng, R Zhang, L Zhang - European Conference on Computer …, 2024 - Springer
This work aims to address an advanced keypoint detection problem: how to accurately
detect any keypoints in complex real-world scenarios, which involves massive, messy, and …

OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing

P Gupta, R Singh, P Shenoy… - European Conference on …, 2024 - Springer
Multi-object multi-part scene segmentation is a challenging task whose complexity scales
exponentially with part granularity and number of scene objects. To address the task, we …

From Simple to Complex Scenes: Learning Robust Feature Representations for Accurate Human Parsing

Y Liu, C Wang, M Lu, J Yang, J Gui… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Human parsing has attracted considerable research interest due to its broad potential
applications in the computer vision community. In this paper, we explore several useful …

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

J Yang, W Zeng, S **, L Xu, W Liu, C Qian… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in Multimodal Large Language Models (MLLMs) have greatly
improved their abilities in image understanding. However, these models often struggle with …