Fast high-resolution image synthesis with latent adversarial diffusion distillation
Diffusion models are the main driver of progress in image and video synthesis, but suffer
from slow inference speed. Distillation methods, like the recently introduced adversarial …
from slow inference speed. Distillation methods, like the recently introduced adversarial …
Osv: One step is enough for high-quality image to video generation
Video diffusion models have shown great potential in generating high-quality videos,
making them an increasingly popular focus. However, their inherent iterative nature leads to …
making them an increasingly popular focus. However, their inherent iterative nature leads to …
From slow bidirectional to fast causal video generators
Current video diffusion models achieve impressive generation quality but struggle in
interactive applications due to bidirectional attention dependencies. The generation of a …
interactive applications due to bidirectional attention dependencies. The generation of a …
Visual Adversarial Attack on Vision-Language Models for Autonomous Driving
Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by
enhancing reasoning capabilities. However, these models remain highly vulnerable to …
enhancing reasoning capabilities. However, these models remain highly vulnerable to …
Onlinevpo: Align video diffusion model with online video-centric preference optimization
In recent years, the field of text-to-video (T2V) generation has made significant strides.
Despite this progress, there is still a gap between theoretical advancements and practical …
Despite this progress, there is still a gap between theoretical advancements and practical …
Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models
P Janowczyk, L Laurier, A Giulietta, A Octavia… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-Modal Language Models (MLLMs) have transformed artificial intelligence by
combining visual and text data, making applications like image captioning, visual question …
combining visual and text data, making applications like image captioning, visual question …
Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving
Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by
enhancing reasoning capabilities; however, these models remain highly susceptible to …
enhancing reasoning capabilities; however, these models remain highly susceptible to …
Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models
The high computational cost and slow inference time are major obstacles to deploying the
video diffusion model (VDM) in practical applications. To overcome this, we introduce a new …
video diffusion model (VDM) in practical applications. To overcome this, we introduce a new …
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
Diffusion probabilistic models have shown significant progress in video generation;
however, their computational efficiency is limited by the large number of sampling steps …
however, their computational efficiency is limited by the large number of sampling steps …
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
We have witnessed the unprecedented success of diffusion-based video generation over
the past year. Recently proposed models from the community have wielded the power to …
the past year. Recently proposed models from the community have wielded the power to …