A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Remamber: Referring image segmentation with mamba twister

Y Yang, C Ma, J Yao, Z Zhong, Y Zhang… - European Conference on …, 2024 - Springer
Abstract Referring Image Segmentation (RIS) leveraging transformers has achieved great
success on the interpretation of complex visual-language tasks. However, the quadratic …

Llafs: When large language models meet few-shot segmentation

L Zhu, T Chen, D Ji, J Ye, J Liu - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper proposes LLaFS the first attempt to leverage large language models (LLMs) in
few-shot segmentation. In contrast to the conventional few-shot segmentation methods that …

Uncovering prototypical knowledge for weakly open-vocabulary semantic segmentation

F Zhang, T Zhou, B Li, H He, C Ma… - Advances in …, 2023 - proceedings.neurips.cc
This paper studies the problem of weakly open-vocabulary semantic segmentation
(WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs …

LLMFormer: Large language model for open-vocabulary semantic segmentation

H Shi, SD Dao, J Cai - International Journal of Computer Vision, 2024 - Springer
Open-vocabulary (OV) semantic segmentation has attracted increasing attention in recent
years, which aims to recognize objects in an open class set for real-world applications …

Open panoramic segmentation

J Zheng, R Liu, Y Chen, K Peng, C Wu, K Yang… - … on Computer Vision, 2024 - Springer
Panoramic images, capturing a 360∘ field of view (FoV), encompass omnidirectional spatial
information crucial for scene understanding. However, it is not only costly to obtain training …

Turbo: Informativity-driven acceleration plug-in for vision-language large models

C Ju, H Wang, H Cheng, X Chen, Z Zhai… - … on Computer Vision, 2024 - Springer
Abstract Vision-Language Large Models (VLMs) recently become primary backbone of AI,
due to the impressive performance. However, their expensive computation costs, ie …

Renovating Names in Open-Vocabulary Segmentation Benchmarks

H Huang, S Peng, D Zhang… - Advances in Neural …, 2025 - proceedings.neurips.cc
Names are essential to both human cognition and vision-language models. Open-
vocabulary models utilize class names as text prompts to generalize to categories unseen …

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

MEA Boudjoghra, A Dai, J Lahoud, H Cholakkal… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent works on open-vocabulary 3D instance segmentation show strong promise, but at
the cost of slow inference speed and high computation requirements. This high computation …

Denoiser: Rethinking the robustness for open-vocabulary action recognition

H Cheng, C Ju, H Wang, J Liu, M Chen, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action
Recognition (OVAR) recently gains increasing attention, with the development of vision …