Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
Effectiveness assessment of recent large vision-language models
The advent of large vision-language models (LVLMs) represents a remarkable advance in
the quest for artificial general intelligence. However, the models' effectiveness in both …
the quest for artificial general intelligence. However, the models' effectiveness in both …
Segpoint: Segment any point cloud via large language model
Despite significant progress in 3D point cloud segmentation, existing methods primarily
address specific tasks and depend on explicit instructions to identify targets, lacking the …
address specific tasks and depend on explicit instructions to identify targets, lacking the …
Primitivenet: decomposing the global constraints for referring segmentation
In referring segmentation, modeling the complicated constraints in the multimodal
information is one of the most challenging problems. As the information in a given language …
information is one of the most challenging problems. As the information in a given language …
RefMask3D: Language-guided transformer for 3D referring segmentation
3D referring segmentation is an emerging and challenging vision-language task that aims to
segment the object described by a natural language expression in a point cloud scene. The …
segment the object described by a natural language expression in a point cloud scene. The …
Temporally consistent referring video object segmentation with hybrid memory
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining
consistent object segmentation due to temporal context variability and the presence of other …
consistent object segmentation due to temporal context variability and the presence of other …
Pvuw 2024 challenge on complex video understanding: Methods and results
Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video
understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object …
understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object …
One token to seg them all: Language instructed reasoning segmentation in videos
We introduce VideoLISA, a video-based multimodal large language model designed to
tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the …
tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the …
Motion-grounded video reasoning: Understanding and perceiving motion at pixel level
In this paper, we introduce Motion-Grounded Video Reasoning, a new motion
understanding task that requires generating visual answers (video segmentation masks) …
understanding task that requires generating visual answers (video segmentation masks) …
LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation
Despite the promising performance of current video segmentation models on existing
benchmarks, these models still struggle with complex scenes. In this paper, we introduce the …
benchmarks, these models still struggle with complex scenes. In this paper, we introduce the …