Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection

Y **ao, Z Luo, Y Liu, Y Ma, H Bian… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted
significant attention due to the growing demand for video analysis. Recent approaches treat …

Open-vocabulary segmentation with semantic-assisted calibration

Y Liu, S Bai, G Li, Y Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper studies open-vocabulary segmentation (OVS) through calibrating in-vocabulary
and domain-biased embedding space with generalized contextual prior of CLIP. As the core …

Universal segmentation at arbitrary granularity with language instruction

Y Liu, C Zhang, Y Wang, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper aims to achieve universal segmentation of arbitrary semantic level. Despite
significant progress in recent years specialist segmentation approaches are limited to …

Decoupling static and hierarchical motion perception for referring video segmentation

S He, H Ding - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …

Towards noise-tolerant speech-referring video object segmentation: Bridging speech and text

X Li, J Wang, X Xu, M Yang, F Yang… - Proceedings of the …, 2023 - aclanthology.org
Linguistic communication is prevalent in Human-Computer Interaction (HCI). Speech
(spoken language) serves as a convenient yet potentially ambiguous form due to noise and …

Losh: Long-short text joint prediction network for referring video object segmentation

L Yuan, M Shi, Z Yue, Q Chen - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Referring video object segmentation (RVOS) aims to segment the target instance referred by
a given text expression in a video clip. The text expression normally contains sophisticated …

Temporally consistent referring video object segmentation with hybrid memory

B Miao, M Bennamoun, Y Gao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining
consistent object segmentation due to temporal context variability and the presence of other …

Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment Localization

K Wang, H Liu, L Jie, Z Li, Y Hu, L Nie - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Video moment localization (VML) aims to identify the temporal boundary semantically
matching the given query. Point-supervised VML balances localization accuracy and …

Efficient prompt tuning of large vision-language model for fine-grained ship classification

L Lan, F Wang, X Zheng, Z Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Remote-sensing fine-grained ship classification (RS-FGSC) poses a significant challenge
due to the high similarity between classes and the limited availability of labeled data, limiting …

Cross-modal cognitive consensus guided audio-visual segmentation

Z Shi, Q Wu, F Meng, L Xu, H Li - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Audio-Visual Segmentation (AVS) aims to extract the sounding object from a video frame,
which is represented by a pixel-wise segmentation mask for application scenarios such as …