State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Citydreamer: Compositional generative model of unbounded 3d cities

H **e, Z Chen, F Hong, Z Liu - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Abstract 3D city generation is a desirable yet challenging task since humans are more
sensitive to structural distortions in urban environments. Additionally generating 3D cities is …

Mme-survey: A comprehensive survey on evaluation of multimodal llms

C Fu, YF Zhang, S Yin, B Li, X Fang, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …

Metaearth: A generative foundation model for global-scale remote sensing image generation

Z Yu, C Liu, L Liu, Z Shi, Z Zou - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
The recent advancement of generative foundational models has ushered in a new era of
image generation in the realm of natural images, revolutionizing art design, entertainment …

Object-aware inversion and reassembly for image editing

Z Yang, G Ding, W Wang, H Chen, B Zhuang… - arxiv preprint arxiv …, 2023 - arxiv.org
By comparing the original and target prompts, we can obtain numerous editing pairs, each
comprising an object and its corresponding editing target. To allow editability while …

Dreamstory: Open-domain story visualization by llm-guided multi-subject consistent diffusion

H He, H Yang, Z Tuo, Y Zhou, Q Wang, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Story visualization aims to create visually compelling images or videos corresponding to
textual narratives. Despite recent advances in diffusion models yielding promising results …

UnmixDiff: Unmixing-Based Diffusion Model for Hyperspectral Image Synthesis

Y Yu, E Pan, Y Ma, X Mei, Q Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The scarcity of hyperspectral images (HSIs) hinders the development of processing methods
and downstream applications. HSI synthesis, which aims to generate realistic samples from …

Instruction-guided editing controls for images and multimedia: A survey in llm era

TT Nguyen, Z Ren, T Pham, PL Nguyen, H Yin… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid advancement of large language models (LLMs) and multimodal learning has
transformed digital content creation and manipulation. Traditional visual editing tools require …

Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

C Liu, K Chen, R Zhao, Z Zou, Z Shi - arxiv preprint arxiv:2501.00895, 2025 - arxiv.org
Generative foundation models have advanced large-scale text-driven natural image
generation, becoming a prominent research trend across various vertical domains …

TGIF: Text-guided inpainting forgery dataset

H Mareen, D Karageorgiou… - … and Security (WIFS), 2024 - ieeexplore.ieee.org
Digital image manipulation has become increasingly accessible and realistic with the advent
of generative AI technologies. Recent developments allow for text-guided inpainting, making …