Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models

W Wu, Y Zhao, MZ Shou, H Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Collecting and annotating images with pixel-wise labels is time-consuming and laborious. In
contrast, synthetic data can be freely available using a generative model (eg, DALL-E …

Dream the impossible: Outlier imagination with diffusion models

X Du, Y Sun, J Zhu, Y Li - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Utilizing auxiliary outlier datasets to regularize the machine learning model has
demonstrated promise for out-of-distribution (OOD) detection and safe prediction. Due to the …

M³vit: Mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design

Z Fan, R Sarkar, Z Jiang, T Chen… - Advances in …, 2022 - proceedings.neurips.cc
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often
lets those tasks learn better jointly. Multi-tasking models have become successful and often …

Mosaicfusion: Diffusion models as data augmenters for large vocabulary instance segmentation

J **e, W Li, X Li, Z Liu, YS Ong, CC Loy - International Journal of …, 2024 - Springer
We present MosaicFusion, a simple yet effective diffusion-based data augmentation
approach for large vocabulary instance segmentation. Our method is training-free and does …

Instancediffusion: Instance-level control for image generation

X Wang, T Darrell, SS Rambhatla… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-image diffusion models produce high quality images but do not offer control over
individual instances in the image. We introduce InstanceDiffusion that adds precise instance …

Wedge: A multi-weather autonomous driving dataset built from generative vision-language models

A Marathe, D Ramanan… - Proceedings of the …, 2023 - openaccess.thecvf.com
The open road poses many challenges to autonomous perception, including poor visibility
from extreme weather conditions. Models trained on good-weather datasets frequently fail at …

Improving zero-shot generalization and robustness of multi-modal models

Y Ge, J Ren, A Gallagher, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive
performance on image classification benchmarks and their zero-shot generalization ability is …

Lake-red: Camouflaged images generation by latent background knowledge retrieval-augmented diffusion

P Zhao, P Xu, P Qin, DP Fan, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Camouflaged vision perception is an important vision task with numerous practical
applications. Due to the expensive collection and labeling costs this community struggles …

3d copy-paste: Physically plausible object insertion for monocular 3d detection

Y Ge, HX Yu, C Zhao, Y Guo, X Huang… - Advances in …, 2024 - proceedings.neurips.cc
A major challenge in monocular 3D object detection is the limited diversity and quantity of
objects in real datasets. While augmenting real scenes with virtual objects holds promise to …

Neural-sim: Learning to generate training data with nerf

Y Ge, H Behl, J Xu, S Gunasekar, N Joshi… - … on Computer Vision, 2022 - Springer
Training computer vision models usually requires collecting and labeling vast amounts of
imagery under a diverse set of scene configurations and properties. This process is …