Diffusion models in vision: A survey

FA Croitoru, V Hondru, RT Ionescu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Denoising diffusion models represent a recent emerging topic in computer vision,
demonstrating remarkable results in the area of generative modeling. A diffusion model is a …

Medical image segmentation review: The success of u-net

R Azad, EK Aghdam, A Rauland, Y Jia… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Automatic medical image segmentation is a crucial topic in the medical domain and
successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the …

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

Vm-unet: Vision mamba unet for medical image segmentation

J Ruan, J Li, S **ang - arxiv preprint arxiv:2402.02491, 2024 - arxiv.org
In the realm of medical image segmentation, both CNN-based and Transformer-based
models have been extensively explored. However, CNNs exhibit limitations in long-range …

Lumiere: A space-time diffusion model for video generation

O Bar-Tal, H Chefer, O Tov, C Herrmann… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
We introduce Lumiere–a text-to-video diffusion model designed for synthesizing videos that
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …

Video diffusion models

J Ho, T Salimans, A Gritsenko… - Advances in …, 2022 - proceedings.neurips.cc
Generating temporally coherent high fidelity video is an important milestone in generative
modeling research. We make progress towards this milestone by proposing a diffusion …

Modelscope text-to-video technical report

J Wang, H Yuan, D Chen, Y Zhang, X Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a
text-to-image synthesis model (ie, Stable Diffusion). ModelScopeT2V incorporates spatio …

Advances in medical image analysis with vision transformers: a comprehensive review

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024 - Elsevier
The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

Conditional image-to-video generation with latent flow diffusion models

H Ni, C Shi, K Li, SX Huang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video
starting from an image (eg, a person's face) and a condition (eg, an action class label like …

An effective CNN and Transformer complementary network for medical image segmentation

F Yuan, Z Zhang, Z Fang - Pattern Recognition, 2023 - Elsevier
The Transformer network was originally proposed for natural language processing. Due to
its powerful representation ability for long-range dependency, it has been extended for …