Fast high-resolution image synthesis with latent adversarial diffusion distillation

A Sauer, F Boesel, T Dockhorn, A Blattmann… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
Diffusion models are the main driver of progress in image and video synthesis, but suffer
from slow inference speed. Distillation methods, like the recently introduced adversarial …

Osv: One step is enough for high-quality image to video generation

X Mao, Z Jiang, FY Wang, W Zhu, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Video diffusion models have shown great potential in generating high-quality videos,
making them an increasingly popular focus. However, their inherent iterative nature leads to …

From slow bidirectional to fast causal video generators

T Yin, Q Zhang, R Zhang, WT Freeman… - arxiv preprint arxiv …, 2024 - arxiv.org
Current video diffusion models achieve impressive generation quality but struggle in
interactive applications due to bidirectional attention dependencies. The generation of a …

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

T Zhang, L Wang, X Zhang, Y Zhang, B Jia… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by
enhancing reasoning capabilities. However, these models remain highly vulnerable to …

Onlinevpo: Align video diffusion model with online video-centric preference optimization

J Zhang, J Wu, W Chen, Y Ji, X **ao, W Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, the field of text-to-video (T2V) generation has made significant strides.
Despite this progress, there is still a gap between theoretical advancements and practical …

Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models

P Janowczyk, L Laurier, A Giulietta, A Octavia… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-Modal Language Models (MLLMs) have transformed artificial intelligence by
combining visual and text data, making applications like image captioning, visual question …

Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving

L Wang, T Zhang, Y Qu, S Liang, Y Chen, A Liu… - arxiv preprint arxiv …, 2025 - arxiv.org
Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by
enhancing reasoning capabilities; however, these models remain highly susceptible to …

Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models

Y Wu, H Wang, Z Chen, D Xu - arxiv preprint arxiv:2411.18375, 2024 - arxiv.org
The high computational cost and slow inference time are major obstacles to deploying the
video diffusion model (VDM) in practical applications. To overcome this, we introduce a new …

DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization

Z Ding, C **, D Liu, H Zheng, KK Singh… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion probabilistic models have shown significant progress in video generation;
however, their computational efficiency is limited by the large number of sampling steps …

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Y Wu, Z Zhang, Y Li, Y Xu, A Kag, Y Sui… - arxiv preprint arxiv …, 2024 - arxiv.org
We have witnessed the unprecedented success of diffusion-based video generation over
the past year. Recently proposed models from the community have wielded the power to …