Študovňa Google

P Wu, X Zhou, G Pang, Y Sun, J Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Current video anomaly detection (VAD) approaches with weak supervisions are inherently
limited to a closed-set setting and may struggle in open-world applications where there can …

Uložiť Citovať Citované 23-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio-visual segmentation via unlabeled frame exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

Uložiť Citovať Citované 6-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Attrseg: open-vocabulary semantic segmentation via attribute decomposition-aggregation

C Ma, Y Yuhuan, C Ju, F Zhang… - Advances in neural …, 2023 - proceedings.neurips.cc

Open-vocabulary semantic segmentation is a challenging task that requires segmenting
novel object categories at inference time. Recent works explore vision-language pre-training …

Uložiť Citovať Citované 19-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization

C Ju, K Zheng, J Liu, P Zhao, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Weakly-supervised temporal action localization (WTAL) learns to detect and classify action
instances with only category labels. Most methods widely adopt the off-the-shelf …

Uložiť Citovať Citované 28-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Turbo: Informativity-driven acceleration plug-in for vision-language large models

C Ju, H Wang, H Cheng, X Chen, Z Zhai… - … on Computer Vision, 2024 - Springer

Abstract Vision-Language Large Models (VLMs) recently become primary backbone of AI,
due to the impressive performance. However, their expensive computation costs, ie …

Uložiť Citovať Citované 5-krát Súvisiace články Všetky verzie 8

Zero-shot temporal action detection by learning multimodal prompts and text-enhanced actionness

A Raza, B Yang, Y Zou - … on Circuits and Systems for Video …, 2024 - ieeexplore.ieee.org

Zero-shot temporal action detection (ZS-TAD), aiming to recognize and detect new and
unseen video actions, is an emerging and challenging task with limited solutions. Recent …

Uložiť Citovať Citované 2-krát Súvisiace články Všetky verzie 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Denoiser: Rethinking the robustness for open-vocabulary action recognition

H Cheng, C Ju, H Wang, J Liu, M Chen, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

As one of the fundamental video tasks in computer vision, Open-Vocabulary Action
Recognition (OVAR) recently gains increasing attention, with the development of vision …

Uložiť Citovať Citované 5-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Turbo: informativity-driven acceleration plug-in for vision-language models

C Ju, H Wang, Z Li, X Chen, Z Zhai, W Huang… - arxiv preprint arxiv …, 2023 - arxiv.org

Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …

Uložiť Citovať Citované 6-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training

H Wang, C Ju, W Lin, S **ao, M Chen, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

In rapidly evolving field of vision-language models (VLMs), contrastive language-image pre-
training (CLIP) has made significant strides, becoming foundation for various downstream …

Uložiť Citovať Citované 2-krát Súvisiace články Všetky verzie 2 HTML verzia

Com-STAL: Compositional spatio-temporal action localization

S Wang, R Yan, P Huang, G Dai… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Spatio-temporal action localization aims to locate the spatial and temporal positions of
actors and classify their actions. However, prior research overlooks the fact that human …

Uložiť Citovať Citované 6-krát Súvisiace články Všetky verzie 2

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Multi-modal prompting for low-shot temporal action localization

Open-vocabulary video anomaly detection

Audio-visual segmentation via unlabeled frame exploitation

Attrseg: open-vocabulary semantic segmentation via attribute decomposition-aggregation

Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization

Turbo: Informativity-driven acceleration plug-in for vision-language large models

Zero-shot temporal action detection by learning multimodal prompts and text-enhanced actionness

Denoiser: Rethinking the robustness for open-vocabulary action recognition

Turbo: informativity-driven acceleration plug-in for vision-language models

Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training

Com-STAL: Compositional spatio-temporal action localization