Dataset diffusion: Diffusion-based synthetic data generation for pixel-level semantic segmentation

Q Nguyen, T Vu, A Tran… - Advances in Neural …, 2023 - proceedings.neurips.cc
Preparing training data for deep vision models is a labor-intensive task. To address this,
generative models have emerged as an effective solution for generating synthetic data …

Understanding the latent space of diffusion models through the lens of riemannian geometry

YH Park, M Kwon, J Choi, J Jo… - Advances in Neural …, 2023 - proceedings.neurips.cc
Despite the success of diffusion models (DMs), we still lack a thorough understanding of
their latent space. To understand the latent space $\mathbf {x} _t\in\mathcal {X} $, we …

Uncovering prototypical knowledge for weakly open-vocabulary semantic segmentation

F Zhang, T Zhou, B Li, H He, C Ma… - Advances in …, 2023 - proceedings.neurips.cc
This paper studies the problem of weakly open-vocabulary semantic segmentation
(WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs …

Diffusion models for zero-shot open-vocabulary segmentation

L Karazija, I Laina, A Vedaldi, C Rupprecht - arxiv e-prints, 2023 - ui.adsabs.harvard.edu
The variety of objects in the real world is nearly unlimited and is thus impossible to capture
using models trained on a fixed set of categories. As a result, in recent years, open …

Lexicon3d: Probing visual foundation models for complex 3d scene understanding

Y Man, S Zheng, Z Bao, M Hebert… - Advances in Neural …, 2025 - proceedings.neurips.cc
Complex 3D scene understanding has gained increasing attention, with scene encoding
strategies built on top of visual foundation models playing a crucial role in this success …

Tokencompose: Text-to-image diffusion with token-level supervision

Z Wang, Z Sha, Z Ding, Y Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract We present TokenCompose a Latent Diffusion Model for text-to-image generation
that achieves enhanced consistency between user-specified text prompts and model …

Attrseg: open-vocabulary semantic segmentation via attribute decomposition-aggregation

C Ma, Y Yuhuan, C Ju, F Zhang… - Advances in neural …, 2023 - proceedings.neurips.cc
Open-vocabulary semantic segmentation is a challenging task that requires segmenting
novel object categories at inference time. Recent works explore vision-language pre-training …

Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization

C Ju, K Zheng, J Liu, P Zhao, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action
instances with only category labels. Most methods widely adopt the off-the-shelf …

Turbo: Informativity-driven acceleration plug-in for vision-language large models

C Ju, H Wang, H Cheng, X Chen, Z Zhai… - … on Computer Vision, 2024 - Springer
Abstract Vision-Language Large Models (VLMs) recently become primary backbone of AI,
due to the impressive performance. However, their expensive computation costs, ie …

Unigs: Unified representation for image generation and segmentation

L Qi, L Yang, W Guo, Y Xu, B Du… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper introduces a novel unified representation of diffusion models for image
generation and segmentation. Specifically we use a colormap to represent entity-level …