Large motion model for unified multi-modal motion generation
Human motion generation, a cornerstone technique in animation and video production, has
widespread applications in various tasks like text-to-motion and music-to-dance. Previous …
widespread applications in various tasks like text-to-motion and music-to-dance. Previous …
Exploiting Diffusion Prior for Generalizable Dense Prediction
Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes
too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable …
too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable …
Generative models: What do they know? do they know things? let's find out!
Generative models excel at mimicking real scenes, suggesting they might inherently encode
important intrinsic scene properties. In this paper, we aim to explore the following key …
important intrinsic scene properties. In this paper, we aim to explore the following key …
Analogist: Out-of-the-box visual in-context learning with image diffusion model
Visual In-Context Learning (ICL) has emerged as a promising research area due to its
capability to accomplish various tasks with limited example pairs through analogical …
capability to accomplish various tasks with limited example pairs through analogical …
Instructgie: Towards generalizable image editing
Recent advances in image editing have been driven by the development of denoising
diffusion models, marking a significant leap forward in this field. Despite these advances, the …
diffusion models, marking a significant leap forward in this field. Despite these advances, the …
Mevg: Multi-event video generation with text-to-video models
We introduce a novel diffusion-based video generation method, generating a video showing
multiple events given multiple individual sentences from the user. Our method does not …
multiple events given multiple individual sentences from the user. Our method does not …
Mtvg: Multi-text video generation with text-to-video models
Recently, video generation has attracted massive attention and yielded noticeable
outcomes. Concerning the characteristics of video, multi-text conditioning incorporating …
outcomes. Concerning the characteristics of video, multi-text conditioning incorporating …
A survey on data augmentation in large model era
Large models, encompassing large language and diffusion models, have shown
exceptional promise in approximating human-level intelligence, garnering significant …
exceptional promise in approximating human-level intelligence, garnering significant …
Edit One for All: Interactive Batch Image Editing
In recent years image editing has advanced remarkably. With increased human control it is
now possible to edit an image in a plethora of ways; from specifying in text what we want to …
now possible to edit an image in a plethora of ways; from specifying in text what we want to …
LLMs Meet Multimodal Generation and Editing: A Survey
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …
combining LLMs with multimodal learning. Previous surveys of multimodal large language …