- Academic Search

Osv: One step is enough for high-quality image to video generation

X Mao, Z Jiang, FY Wang, W Zhu, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Video diffusion models have shown great potential in generating high-quality videos,
making them an increasingly popular focus. However, their inherent iterative nature leads to …

保存引用被引用数: 5 関連記事全 2 バージョン HTMLバージョン

From slow bidirectional to fast causal video generators

T Yin, Q Zhang, R Zhang, WT Freeman… - arxiv preprint arxiv …, 2024 - arxiv.org

Current video diffusion models achieve impressive generation quality but struggle in
interactive applications due to bidirectional attention dependencies. The generation of a …

保存引用被引用数: 1 関連記事 HTMLバージョン

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

T Zhang, L Wang, X Zhang, Y Zhang, B Jia… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by
enhancing reasoning capabilities. However, these models remain highly vulnerable to …

保存引用被引用数: 1 関連記事 HTMLバージョン

Onlinevpo: Align video diffusion model with online video-centric preference optimization

J Zhang, J Wu, W Chen, Y Ji, X **ao, W Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, the field of text-to-video (T2V) generation has made significant strides.
Despite this progress, there is still a gap between theoretical advancements and practical …

保存引用被引用数: 1 関連記事 HTMLバージョン

Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models

P Janowczyk, L Laurier, A Giulietta, A Octavia… - arxiv preprint arxiv …, 2024 - arxiv.org

Multi-Modal Language Models (MLLMs) have transformed artificial intelligence by
combining visual and text data, making applications like image captioning, visual question …

保存引用関連記事全 2 バージョン HTMLバージョン

Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving

L Wang, T Zhang, Y Qu, S Liang, Y Chen, A Liu… - arxiv preprint arxiv …, 2025 - arxiv.org

Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by
enhancing reasoning capabilities; however, these models remain highly susceptible to …

Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models

Y Wu, H Wang, Z Chen, D Xu - arxiv preprint arxiv:2411.18375, 2024 - arxiv.org

The high computational cost and slow inference time are major obstacles to deploying the
video diffusion model (VDM) in practical applications. To overcome this, we introduce a new …

DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization

Z Ding, C **, D Liu, H Zheng, KK Singh… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion probabilistic models have shown significant progress in video generation;
however, their computational efficiency is limited by the large number of sampling steps …

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Y Wu, Z Zhang, Y Li, Y Xu, A Kag, Y Sui… - arxiv preprint arxiv …, 2024 - arxiv.org

We have witnessed the unprecedented success of diffusion-based video generation over
the past year. Recently proposed models from the community have wielded the power to …