Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Univs: Unified and universal video segmentation with prompts as queries

M Li, S Li, X Zhang, L Zhang - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Despite the recent advances in unified image segmentation (IS) develo** a unified video
segmentation (VS) model remains a challenge. This is mainly because generic category …

Unified embedding alignment for open-vocabulary video instance segmentation

H Fang, P Wu, Y Li, X Zhang, X Lu - European Conference on Computer …, 2024 - Springer
Abstract Open-Vocabulary Video Instance Segmentation (VIS) is attracting increasing
attention due to its ability to segment and track arbitrary objects. However, the recent Open …

TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

J Li, X Zhou, K Jiang, L Hong, P Guo, Z Chen… - Proceedings of the …, 2024 - dl.acm.org
Multimodal fusion, leveraging data like vision and language, is rapidly gaining traction. This
enriched data representation improves performance across various tasks. Existing methods …

Panovos: Bridging non-panoramic and panoramic views with transformer for video segmentation

S Yan, X Xu, R Zhang, L Hong, W Chen… - … on Computer Vision, 2024 - Springer
Panoramic videos contain richer spatial information and have attracted tremendous amounts
of attention due to their exceptional experience in some fields such as autonomous driving …

Learning the What and How of Annotation in Video Object Segmentation

T Delatolas, V Kalogeiton… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Video Object Segmentation (VOS) is crucial for several applications, from video
editing to video data generation. Training a VOS model requires an abundance of manually …

X-prompt: Multi-modal visual prompt for video object segmentation

P Guo, W Li, H Huang, L Hong, X Zhou… - Proceedings of the …, 2024 - dl.acm.org
Multi-modal Video Object Segmentation (VOS), including RGB-Thermal, RGB-Depth, and
RGB-Event, has garnered attention due to its capability to address challenging scenarios …

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation

Z Cheng, K Li, H Li, P **, C Liu, X Zheng, R Ji… - arxiv preprint arxiv …, 2024 - arxiv.org
Temporally locating objects with arbitrary class texts is the primary pursuit of open-
vocabulary Video Instance Segmentation (VIS). Because of the insufficient vocabulary of …

Towards Decision-based Sparse Attacks on Video Recognition

K Jiang, Z Chen, X Zhou, J Zhang, L Hong… - Proceedings of the 31st …, 2023 - dl.acm.org
Recent studies indicate that sparse attacks threaten the security of deep learning models,
which modify only a small set of pixels in the input based on the l0 norm constraint. While …