Diffusion models: A comprehensive survey of methods and applications

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org
Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Segment everything everywhere all at once

X Zou, J Yang, H Zhang, F Li, L Li… - Advances in …, 2024 - proceedings.neurips.cc
In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Q Yu, J He, X Deng, X Shen… - Advances in Neural …, 2023 - proceedings.neurips.cc
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …

Segment and Recognize Anything at Any Granularity

F Li, H Zhang, P Sun, X Zou, S Liu, C Li, J Yang… - … on Computer Vision, 2024 - Springer
In this work, we introduce Semantic-SAM, an augmented image segmentation foundation for
segmenting and recognizing anything at desired granularities. Compared to the …

A simple framework for open-vocabulary segmentation and detection

H Zhang, F Li, X Zou, S Liu, C Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this work, we present OpenSeeD, a simple Open-vocabulary Segmentation and Detection
framework that learns from different segmentation and detection datasets. To bridge the gap …

Recognize anything: A strong image tagging model

Y Zhang, X Huang, J Ma, Z Li, Z Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present the Recognize Anything Model (RAM): a strong foundation model for
image tagging. RAM makes a substantial step for foundation models in computer vision …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …