- Academic Search

Open-vclip: Transforming clip to an open-vocabulary video model via interpolated weight optimization

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Speichern Zitieren Zitiert von: 125 Ähnliche Artikel Alle 10 Versionen

[Free GPT-4]

[PDF] thecvf.com

Implicit temporal modeling with learnable alignment for video recognition

S Tu, Q Dai, Z Wu, ZQ Cheng, H Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Contrastive language-image pretraining (CLIP) has demonstrated remarkable success in
various image tasks. However, how to extend CLIP with effective temporal modeling is still …

Speichern Zitieren Zitiert von: 37 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Open-vocabulary video anomaly detection

P Wu, X Zhou, G Pang, Y Sun, J Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Current video anomaly detection (VAD) approaches with weak supervisions are inherently
limited to a closed-set setting and may struggle in open-world applications where there can …

Speichern Zitieren Zitiert von: 20 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Improving adversarial robustness of masked autoencoders via test-time frequency-domain prompting

Q Huang, X Dong, D Chen, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we investigate the adversarial robustness of vision transformers that are
equipped with BERT pretraining (eg, BEiT, MAE). A surprising observation is that MAE has …

Speichern Zitieren Zitiert von: 11 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending

T Wei, D Chen, W Zhou, J Liao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Hair editing has made tremendous progress in recent years. Early hair editing methods use
well-drawn sketches or masks to specify the editing conditions. Even though they can …

Speichern Zitieren Zitiert von: 15 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Building an open-vocabulary video CLIP model with better architectures, optimization and data

Z Wu, Z Weng, W Peng, X Yang, A Li… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in
zero-shot image recognition, limited effort has been made exploring its potential for zero …

Speichern Zitieren Zitiert von: 16 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]

[PDF] neurips.cc

Learning from rich semantics and coarse locations for long-tailed object detection

L Meng, X Dai, J Yang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc

Long-tailed object detection (LTOD) aims to handle the extreme data imbalance in real-
world datasets, where many tail classes have scarce instances. One popular strategy is to …

Speichern Zitieren Zitiert von: 9 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Chartreader: A unified framework for chart derendering and comprehension without heuristic rules

ZQ Cheng, Q Dai… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Charts are a powerful tool for visually conveying complex data, but their comprehension
poses a challenge due to the diverse chart types and intricate components. Existing chart …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

3dstyle-diffusion: Pursuing fine-grained text-driven 3d stylization with 2d diffusion models

H Yang, Y Chen, Y Pan, T Yao, Z Chen… - Proceedings of the 31st …, 2023 - dl.acm.org

3D content creation via text-driven stylization has played a fundamental challenge to
multimedia and graphics community. Recent advances of cross-modal foundation models …

Speichern Zitieren Zitiert von: 15 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] arxiv.org

Leveraging temporal contextualization for video action recognition

M Kim, D Han, T Kim, B Han - European Conference on Computer Vision, 2024 - Springer

We propose a novel framework for video understanding, called Temporally Contextualized
CLIP (TC-CLIP), which leverages essential temporal information through global interactions …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 2 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Open-vclip: Transforming clip to an open-vocabulary video model via interpolated weight optimization

Towards open vocabulary learning: A survey

Implicit temporal modeling with learnable alignment for video recognition

Open-vocabulary video anomaly detection

Improving adversarial robustness of masked autoencoders via test-time frequency-domain prompting

HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending

Building an open-vocabulary video CLIP model with better architectures, optimization and data

Learning from rich semantics and coarse locations for long-tailed object detection

Chartreader: A unified framework for chart derendering and comprehension without heuristic rules

3dstyle-diffusion: Pursuing fine-grained text-driven 3d stylization with 2d diffusion models

Leveraging temporal contextualization for video action recognition