OMG-Seg: Is one model good enough for all segmentation?

X Li, H Yuan, W Li, H Ding, S Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

T Zhang, X Li, H Fei, H Yuan, S Wu, S Ji… - arxiv preprint arxiv …, 2024 - arxiv.org
Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …

Point could mamba: Point cloud learning via state space model

T Zhang, X Li, H Yuan, S Ji, S Yan - arxiv preprint arxiv:2403.00762, 2024 - arxiv.org
In this work, for the first time, we demonstrate that Mamba-based point cloud methods can
outperform point-based methods. Mamba exhibits strong global modeling capabilities and …

Explore in-context segmentation via latent diffusion models

C Wang, X Li, H Ding, L Qi, J Zhang, Y Tong… - arxiv preprint arxiv …, 2024 - arxiv.org
In-context segmentation has drawn more attention with the introduction of vision foundation
models. Most existing approaches adopt metric learning or masked image modeling to build …

VG4D: Vision-Language Model Goes 4D Video Recognition

Z Deng, X Li, X Li, Y Tong, S Zhao, M Liu - arxiv preprint arxiv:2404.11605, 2024 - arxiv.org
Understanding the real world through point cloud video is a crucial aspect of robotics and
autonomous driving systems. However, prevailing methods for 4D point cloud recognition …

USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation

W Weng, H Wang, J He, L He, G **e - arxiv preprint arxiv:2412.09220, 2024 - arxiv.org
Contrastive learning has achieved great success in skeleton-based representation learning
recently. However, the prevailing methods are predominantly negative-based, necessitating …

MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion

L Wu, L Lin, J Zhang, Y Ma, J Liu - European Conference on Computer …, 2024 - Springer
Self-supervised learning has proved effective for skeleton-based human action
understanding. However, previous works either rely on contrastive learning that suffers false …

Point-In-Context: Understanding Point Cloud via In-Context Learning

M Liu, Z Fang, X Li, JM Buhmann, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
With the emergence of large-scale models trained on diverse datasets, in-context learning
has emerged as a promising paradigm for multitasking, notably in natural language …

CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition

Y Wen, M Liu, S Wu, B Ding - arxiv preprint arxiv:2410.07153, 2024 - arxiv.org
Skeleton-based multi-entity action recognition is a challenging task aiming to identify
interactive actions or group activities involving multiple diverse entities. Existing models for …

MKTZ: multi-semantic embedding and key frame masking techniques for zero-shot skeleton action recognition

H Chen, S Guo, Z Chen - Multimedia Systems, 2024 - Springer
The fundamental task of zero-shot skeleton-based action recognition is to learn existing
skeletal actions during the training phase and to accurately identify unseen actions during …