Large motion model for unified multi-modal motion generation

M Zhang, D **, C Gu, F Hong, Z Cai, J Huang… - … on Computer Vision, 2024 - Springer
Human motion generation, a cornerstone technique in animation and video production, has
widespread applications in various tasks like text-to-motion and music-to-dance. Previous …

Exploiting Diffusion Prior for Generalizable Dense Prediction

HY Lee, HY Tseng, MH Yang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes
too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable …

Generative models: What do they know? do they know things? let's find out!

X Du, N Kolkin, G Shakhnarovich, A Bhattad - arxiv preprint arxiv …, 2023 - arxiv.org
Generative models excel at mimicking real scenes, suggesting they might inherently encode
important intrinsic scene properties. In this paper, we aim to explore the following key …

Analogist: Out-of-the-box visual in-context learning with image diffusion model

Z Gu, S Yang, J Liao, J Huo, Y Gao - ACM Transactions on Graphics …, 2024 - dl.acm.org
Visual In-Context Learning (ICL) has emerged as a promising research area due to its
capability to accomplish various tasks with limited example pairs through analogical …

Instructgie: Towards generalizable image editing

Z Meng, C Yang, J Liu, H Tang, P Zhao… - European Conference on …, 2024 - Springer
Recent advances in image editing have been driven by the development of denoising
diffusion models, marking a significant leap forward in this field. Despite these advances, the …

Mevg: Multi-event video generation with text-to-video models

G Oh, J Jeong, S Kim, W Byeon, J Kim, S Kim… - European Conference on …, 2024 - Springer
We introduce a novel diffusion-based video generation method, generating a video showing
multiple events given multiple individual sentences from the user. Our method does not …

Mtvg: Multi-text video generation with text-to-video models

G Oh, J Jeong, S Kim, W Byeon, J Kim, S Kim… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, video generation has attracted massive attention and yielded noticeable
outcomes. Concerning the characteristics of video, multi-text conditioning incorporating …

A survey on data augmentation in large model era

Y Zhou, C Guo, X Wang, Y Chang, Y Wu - arxiv preprint arxiv:2401.15422, 2024 - arxiv.org
Large models, encompassing large language and diffusion models, have shown
exceptional promise in approximating human-level intelligence, garnering significant …

Edit One for All: Interactive Batch Image Editing

T Nguyen, U Ojha, Y Li, H Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
In recent years image editing has advanced remarkably. With increased human control it is
now possible to edit an image in a plethora of ways; from specifying in text what we want to …

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …