Lisa: Reasoning segmentation via large language model
Although perception systems have made remarkable advancements in recent years they still
rely on explicit human instruction or pre-defined categories to identify the target objects …
rely on explicit human instruction or pre-defined categories to identify the target objects …
Segment and Recognize Anything at Any Granularity
In this work, we introduce Semantic-SAM, an augmented image segmentation foundation for
segmenting and recognizing anything at desired granularities. Compared to the …
segmenting and recognizing anything at desired granularities. Compared to the …
What does a platypus look like? generating customized prompts for zero-shot image classification
Open-vocabulary models are a promising new paradigm for image classification. Unlike
traditional classification models, open-vocabulary models classify among any arbitrary set of …
traditional classification models, open-vocabulary models classify among any arbitrary set of …
Semmae: Semantic-guided masking for learning masked autoencoders
Recently, significant progress has been made in masked image modeling to catch up to
masked language modeling. However, unlike words in NLP, the lack of semantic …
masked language modeling. However, unlike words in NLP, the lack of semantic …
Paco: Parts and attributes of common objects
Object models are gradually progressing from predicting just category labels to providing
detailed descriptions of object instances. This motivates the need for large datasets which …
detailed descriptions of object instances. This motivates the need for large datasets which …
Osprey: Pixel understanding with visual instruction tuning
Multimodal large language models (MLLMs) have recently achieved impressive general-
purpose vision-language capabilities through visual instruction tuning. However current …
purpose vision-language capabilities through visual instruction tuning. However current …
Going denser with open-vocabulary part segmentation
Object detection has been expanded from a limited number of categories to open
vocabulary. Moving forward, a complete intelligent vision system requires understanding …
vocabulary. Moving forward, a complete intelligent vision system requires understanding …
Pip-net: Patch-based intuitive prototypes for interpretable image classification
Interpretable methods based on prototypical patches recognize various components in an
image in order to explain their reasoning to humans. However, existing prototype-based …
image in order to explain their reasoning to humans. However, existing prototype-based …
Dataset pruning: Reducing training data by examining generalization influence
The great success of deep learning heavily relies on increasingly larger training data, which
comes at a price of huge computational and infrastructural costs. This poses crucial …
comes at a price of huge computational and infrastructural costs. This poses crucial …
Animal3d: A comprehensive dataset of 3d animal pose and shape
Accurately estimating the 3D pose and shape is an essential step towards understanding
animal behavior, and can potentially benefit many downstream applications, such as wildlife …
animal behavior, and can potentially benefit many downstream applications, such as wildlife …