InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
C Wei, Y Zhong, H Tan, Y Zeng, Y Liu, Z Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
Boosted by Multi-modal Large Language Models (MLLMs), text-guided universal
segmentation models for the image and video domains have made rapid progress recently …
segmentation models for the image and video domains have made rapid progress recently …