TIVE: A toolbox for identifying video instance segmentation errors
In this paper, we introduce TIVE, a Toolbox for Identifying Video instance segmentation
Errors. By directly operating output prediction files, TIVE defines isolated error types and …
Errors. By directly operating output prediction files, TIVE defines isolated error types and …
Learning semantical dynamics and spatiotemporal collaboration for human pose estimation in video
R Feng, H Chen - Neurocomputing, 2025 - Elsevier
Temporal modeling and spatio-temporal collaboration are pivotal techniques for video-
based human pose estimation. Most state-of-the-art methods adopt optical flow or temporal …
based human pose estimation. Most state-of-the-art methods adopt optical flow or temporal …
Adept: Annotation-denoising auxiliary tasks with discrete cosine transform map and keypoint for human-centric pretraining
Human-centric perception is the core of diverse computer vision tasks and has been a long-
standing research focus. However, previous research studied these human-centric tasks …
standing research focus. However, previous research studied these human-centric tasks …
MaskRecon: High-quality human reconstruction via masked autoencoders using a single RGB-D image
In this paper, we explore reconstructing high-quality clothed 3D humans from a single RGB-
D image, assuming that virtual humans can be represented by front-view and back-view …
D image, assuming that virtual humans can be represented by front-view and back-view …
Pose-guided hierarchical semantic decomposition and composition for human parsing
Human parsing is a fine-grained semantic segmentation task, which needs to understand
human semantic parts. Most existing methods model human parsing as a general semantic …
human semantic parts. Most existing methods model human parsing as a general semantic …
Crowded pose-guided multi-task learning for instance-level human parsing
Instance-level human parsing remains challenging due to the similarity between human
instances and background, complex interactions, and various poses. Aiming at assigning …
instances and background, complex interactions, and various poses. Aiming at assigning …
WNet: A dual‐encoded multi‐human parsing network
In recent years, multi‐human parsing has become a focal point in research, yet prevailing
methods often rely on intermediate stages and lacking pixel‐level analysis. Moreover, their …
methods often rely on intermediate stages and lacking pixel‐level analysis. Moreover, their …
SP-YOLO: an end-to-end lightweight network for real-time human pose estimation
Y Zhang, Z Wang, M Li, P Gao - Signal, Image and Video Processing, 2024 - Springer
The traditional multi-person human pose estimation method has several problems including
low real-time detection effect, low recognition efficiency, and a large number of calculation …
low real-time detection effect, low recognition efficiency, and a large number of calculation …
Nondiscriminatory treatment: A straightforward framework for multi-human parsing
M Yan, G Zhang, T Zhang, Y Zhang - Neurocomputing, 2021 - Elsevier
Multi-human parsing aims to segment every body part of every human instance. Nearly all
state-of-the-art methods follow the “detection first” or “segmentation first” pipelines. Different …
state-of-the-art methods follow the “detection first” or “segmentation first” pipelines. Different …