Development of metaverse for intelligent healthcare

G Wang, A Badal, X Jia, JS Maltz, K Mueller… - Nature machine …, 2022‏ - nature.com
The metaverse integrates physical and virtual realities, enabling humans and their avatars to
interact in an environment supported by technologies such as high-speed internet, virtual …

Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024‏ - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Align your latents: High-resolution video synthesis with latent diffusion models

A Blattmann, R Rombach, H Ling… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding
excessive compute demands by training a diffusion model in a compressed lower …

Next-gpt: Any-to-any multimodal llm

S Wu, H Fei, L Qu, W Ji, TS Chua - Forty-first International …, 2024‏ - openreview.net
While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides,
they mostly fall prey to the limitation of only input-side multimodal understanding, without the …

Text2video-zero: Text-to-image diffusion models are zero-shot video generators

L Khachatryan, A Movsisyan… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Recent text-to-video generation approaches rely on computationally heavy training and
require large-scale video datasets. In this paper, we introduce a new task, zero-shot text-to …

Animatediff: Animate your personalized text-to-image diffusion models without specific tuning

Y Guo, C Yang, A Rao, Z Liang, Y Wang, Y Qiao… - arxiv preprint arxiv …, 2023‏ - arxiv.org
With the advance of text-to-image (T2I) diffusion models (eg, Stable Diffusion) and
corresponding personalization techniques such as DreamBooth and LoRA, everyone can …

Repurposing diffusion-based image generators for monocular depth estimation

B Ke, A Obukhov, S Huang, N Metzger… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …

Mvdream: Multi-view diffusion for 3d generation

Y Shi, P Wang, J Ye, M Long, K Li, X Yang - arxiv preprint arxiv …, 2023‏ - arxiv.org
We introduce MVDream, a diffusion model that is able to generate consistent multi-view
images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing

D Li, J Li, S Hoi - Advances in Neural Information …, 2023‏ - proceedings.neurips.cc
Subject-driven text-to-image generation models create novel renditions of an input subject
based on text prompts. Existing models suffer from lengthy fine-tuning and difficulties …