- Academic Search

Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection

Y **ao, Z Luo, Y Liu, Y Ma, H Bian… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted
significant attention due to the growing demand for video analysis. Recent approaches treat …

Save Cite Cited by 31 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Open-vocabulary segmentation with semantic-assisted calibration

Y Liu, S Bai, G Li, Y Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

This paper studies open-vocabulary segmentation (OVS) through calibrating in-vocabulary
and domain-biased embedding space with generalized contextual prior of CLIP. As the core …

Save Cite Cited by 21 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Universal segmentation at arbitrary granularity with language instruction

Y Liu, C Zhang, Y Wang, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper aims to achieve universal segmentation of arbitrary semantic level. Despite
significant progress in recent years specialist segmentation approaches are limited to …

Save Cite Cited by 11 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Decoupling static and hierarchical motion perception for referring video segmentation

S He, H Ding - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …

Save Cite Cited by 20 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aclanthology.org

Towards noise-tolerant speech-referring video object segmentation: Bridging speech and text

X Li, J Wang, X Xu, M Yang, F Yang… - Proceedings of the …, 2023 - aclanthology.org

Linguistic communication is prevalent in Human-Computer Interaction (HCI). Speech
(spoken language) serves as a convenient yet potentially ambiguous form due to noise and …

Save Cite Cited by 15 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Losh: Long-short text joint prediction network for referring video object segmentation

L Yuan, M Shi, Z Yue, Q Chen - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Referring video object segmentation (RVOS) aims to segment the target instance referred by
a given text expression in a video clip. The text expression normally contains sophisticated …

Save Cite Cited by 7 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] uwa.edu.au

Temporally consistent referring video object segmentation with hybrid memory

B Miao, M Bennamoun, Y Gao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining
consistent object segmentation due to temporal context variability and the presence of other …

Save Cite Cited by 2 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] openreview.net

Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment Localization

K Wang, H Liu, L Jie, Z Li, Y Hu, L Nie - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

Video moment localization (VML) aims to identify the temporal boundary semantically
matching the given query. Point-supervised VML balances localization accuracy and …

Save Cite Cited by 1 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Efficient prompt tuning of large vision-language model for fine-grained ship classification

L Lan, F Wang, X Zheng, Z Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Remote-sensing fine-grained ship classification (RS-FGSC) poses a significant challenge
due to the high similarity between classes and the limited availability of labeled data, limiting …

Save Cite Cited by 3 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Cross-modal cognitive consensus guided audio-visual segmentation

Z Shi, Q Wu, F Meng, L Xu, H Li - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Audio-Visual Segmentation (AVS) aims to extract the sounding object from a video frame,
which is represented by a pixel-wise segmentation mask for application scenarios such as …

Save Cite Cited by 4 Related articles All 2 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Soc: Semantic-assisted object cluster for referring video object segmentation

Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection

Open-vocabulary segmentation with semantic-assisted calibration

Universal segmentation at arbitrary granularity with language instruction

Decoupling static and hierarchical motion perception for referring video segmentation

Towards noise-tolerant speech-referring video object segmentation: Bridging speech and text

Losh: Long-short text joint prediction network for referring video object segmentation

Temporally consistent referring video object segmentation with hybrid memory

Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment Localization

Efficient prompt tuning of large vision-language model for fine-grained ship classification

Cross-modal cognitive consensus guided audio-visual segmentation