BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models

F Wang, H Yin, Y Dong, H Zhu, C Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The inversion of diffusion model sampling, which aims to find the corresponding initial noise
of a sample, plays a critical role in various tasks. Recently, several heuristic exact inversion …

Personalized Image Generation with Deep Generative Models: A Decade Survey

Y Wei, Y Zheng, Y Zhang, M Liu, Z Ji, L Zhang… - arxiv preprint arxiv …, 2025‏ - arxiv.org
Recent advancements in generative models have significantly facilitated the development of
personalized content creation. Given a small set of images with user-specific concept …

Physgame: Uncovering physical commonsense violations in gameplay videos

M Cao, H Tang, H Zhao, H Guo, J Liu, G Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Recent advancements in video-based large language models (Video LLMs) have witnessed
the emergence of diverse capabilities to reason and interpret dynamic visual content …

Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions

X Shuai, H Ding, Z Qin, H Luo, X Ma, D Tao - arxiv preprint arxiv …, 2025‏ - arxiv.org
Controlling the movements of dynamic objects and the camera within generated videos is a
meaningful yet challenging task. Due to the lack of datasets with comprehensive motion …

Federated Incremental Named Entity Recognition

D Zhang, Y Yu, C Li, J Dong, D Yu - arxiv preprint arxiv:2411.11623, 2024‏ - arxiv.org
Federated Named Entity Recognition (FNER) boosts model training within each local client
by aggregating the model updates of decentralized local clients, without sharing their private …

Ipdm: identity preserving diffusion model for face sketch and photo synthesis

D Tang, X Jiang, Y Zhang, Y Dai, Y Lin - Machine Vision and Applications, 2025‏ - Springer
Face sketch and photo synthesis is widely applied in industry and information fields, such as
entertainment business and heterogeneous face retrieval. The key challenge lies in …

[PDF][PDF] CONTINUAL VISUAL INSTRUCTION TUNING

M Cao, Y Liu, Y Liu, T Wang, J Dong, H Ding, X Zhang…‏ - researchgate.net
Instruction tuning constitutes a prevalent technique for tailoring Large Vision Language
Models (LVLMs) to meet individual task requirements. To date, most of the existing …