InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models

C Wei, Y Zhong, H Tan, Y Zeng, Y Liu, Z Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
Boosted by Multi-modal Large Language Models (MLLMs), text-guided universal
segmentation models for the image and video domains have made rapid progress recently …