A survey on video diffusion models

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Simda: Simple diffusion adapter for efficient video generation

Z **ng, Q Dai, H Hu, Z Wu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …

Masked video distillation: Rethinking masked feature modeling for self-supervised video representation learning

R Wang, D Chen, Z Wu, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Benefiting from masked visual modeling, self-supervised video representation learning has
achieved remarkable progress. However, existing methods focus on learning …

Prototypical residual networks for anomaly detection and localization

H Zhang, Z Wu, Z Wang, Z Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Anomaly detection and localization are widely used in industrial manufacturing for its
efficiency and effectiveness. Anomalies are rare and hard to collect and supervised models …

Implicit temporal modeling with learnable alignment for video recognition

S Tu, Q Dai, Z Wu, ZQ Cheng, H Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Contrastive language-image pretraining (CLIP) has demonstrated remarkable success in
various image tasks. However, how to extend CLIP with effective temporal modeling is still …

Open-vclip: Transforming clip to an open-vocabulary video model via interpolated weight optimization

Z Weng, X Yang, A Li, Z Wu… - … Conference on Machine …, 2023 - proceedings.mlr.press
Abstract Contrastive Language-Image Pretraining (CLIP) has demonstrated impressive zero-
shot learning abilities for image understanding, yet limited effort has been made to …

Motioneditor: Editing video motion via content-aware diffusion

S Tu, Q Dai, ZQ Cheng, H Hu, X Han… - Proceedings of the …, 2024 - openaccess.thecvf.com
Existing diffusion-based video editing models have made gorgeous advances for editing
attributes of a source video over time but struggle to manipulate the motion information while …

XVO: Generalized visual odometry via cross-modal self-training

L Lai, Z Shangguan, J Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose XVO, a semi-supervised learning method for training generalized monocular
Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and …

vid-tldr: Training free token merging for light-weight video transformer

J Choi, S Lee, J Chu, M Choi… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Video Transformers have become the prevalent solution for various video downstream tasks
with superior expressive power and flexibility. However these video transformers suffer from …

Clip-tsa: Clip-assisted temporal self-attention for weakly-supervised video anomaly detection

HK Joo, K Vo, K Yamazaki, N Le - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Video anomaly detection (VAD)–commonly formulated as a multiple-instance learning
problem in a weakly-supervised manner due to its labor-intensive nature–is a challenging …