Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Effectiveness assessment of recent large vision-language models

Y Jiang, X Yan, GP Ji, K Fu, M Sun, H **ong, DP Fan… - Visual Intelligence, 2024 - Springer
The advent of large vision-language models (LVLMs) represents a remarkable advance in
the quest for artificial general intelligence. However, the models' effectiveness in both …

Segpoint: Segment any point cloud via large language model

S He, H Ding, X Jiang, B Wen - European Conference on Computer Vision, 2024 - Springer
Despite significant progress in 3D point cloud segmentation, existing methods primarily
address specific tasks and depend on explicit instructions to identify targets, lacking the …

Primitivenet: decomposing the global constraints for referring segmentation

C Liu, X Jiang, H Ding - Visual Intelligence, 2024 - Springer
In referring segmentation, modeling the complicated constraints in the multimodal
information is one of the most challenging problems. As the information in a given language …

RefMask3D: Language-guided transformer for 3D referring segmentation

S He, H Ding - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
3D referring segmentation is an emerging and challenging vision-language task that aims to
segment the object described by a natural language expression in a point cloud scene. The …

Temporally consistent referring video object segmentation with hybrid memory

B Miao, M Bennamoun, Y Gao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining
consistent object segmentation due to temporal context variability and the presence of other …

Pvuw 2024 challenge on complex video understanding: Methods and results

H Ding, C Liu, Y Wei, N Ravi, S He, S Bai, P Torr… - arxiv preprint arxiv …, 2024 - arxiv.org
Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video
understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object …

One token to seg them all: Language instructed reasoning segmentation in videos

Z Bai, T He, H Mei, P Wang, Z Gao, J Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce VideoLISA, a video-based multimodal large language model designed to
tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the …

Motion-grounded video reasoning: Understanding and perceiving motion at pixel level

A Deng, T Chen, S Yu, T Yang, L Spencer… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we introduce Motion-Grounded Video Reasoning, a new motion
understanding task that requires generating visual answers (video segmentation masks) …

LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

H Ding, L Hong, C Liu, N Xu, L Yang, Y Fan… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the promising performance of current video segmentation models on existing
benchmarks, these models still struggle with complex scenes. In this paper, we introduce the …