Diffusion models: A comprehensive survey of methods and applications
Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …
record-breaking performance in many applications, including image synthesis, video …
Foundation Models Defining a New Era in Vision: a Survey and Outlook
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …
fundamental to understanding our world. The complex relations between objects and their …
Segment everything everywhere all at once
In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …
Emergent correspondence from image diffusion
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …
this paper, we show that correspondence emerges in image diffusion models without any …
Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …
objects from an open set of categories in diverse environments. One way to address this …
Segment and Recognize Anything at Any Granularity
In this work, we introduce Semantic-SAM, an augmented image segmentation foundation for
segmenting and recognizing anything at desired granularities. Compared to the …
segmenting and recognizing anything at desired granularities. Compared to the …
A simple framework for open-vocabulary segmentation and detection
In this work, we present OpenSeeD, a simple Open-vocabulary Segmentation and Detection
framework that learns from different segmentation and detection datasets. To bridge the gap …
framework that learns from different segmentation and detection datasets. To bridge the gap …
Recognize anything: A strong image tagging model
Abstract We present the Recognize Anything Model (RAM): a strong foundation model for
image tagging. RAM makes a substantial step for foundation models in computer vision …
image tagging. RAM makes a substantial step for foundation models in computer vision …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …