T-rex2: Towards generic object detection via text-visual prompt synergy
We present T-Rex2, a highly practical model for open-set object detection. Previous open-
set object detection methods relying on text prompts effectively encapsulate the abstract …
set object detection methods relying on text prompts effectively encapsulate the abstract …
A survey on open-vocabulary detection and segmentation: Past, present, and future
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …
have made tremendous progress in deep learning era. Due to the expensive manual …
Visual in-context prompting
In-context prompting in large language models (LLMs) has become a prevalent approach to
improve zero-shot capabilities but this idea is less explored in the vision domain. Existing …
improve zero-shot capabilities but this idea is less explored in the vision domain. Existing …
Language-conditioned detection transformer
We present a new open-vocabulary detection framework. Our framework uses both image-
level labels and detailed detection annotations when available. Our framework proceeds in …
level labels and detailed detection annotations when available. Our framework proceeds in …
An end-to-end real-world camera imaging pipeline
Recent advances in neural camera imaging pipelines have demonstrated notable progress.
Nevertheless, the real-world imaging pipeline still faces challenges including the lack of joint …
Nevertheless, the real-world imaging pipeline still faces challenges including the lack of joint …
Mobilevlm: A vision-language model for better intra-and inter-ui understanding
Q Wu, W Xu, W Liu, T Tan, J Liu, A Li, J Luan… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, mobile AI agents based on VLMs have been gaining increasing attention. These
works typically utilize VLM as a foundation, fine-tuning it with instruction-based mobile …
works typically utilize VLM as a foundation, fine-tuning it with instruction-based mobile …
Dino-x: A unified vision model for open-world object detection and understanding
In this paper, we introduce DINO-X, which is a unified object-centric vision model developed
by IDEA Research with the best open-world object detection performance to date. DINO-X …
by IDEA Research with the best open-world object detection performance to date. DINO-X …
OVMR: Open-Vocabulary Recognition with Multi-Modal References
The challenge of open-vocabulary recognition lies in the model has no clue of new
categories it is applied to. Existing works have proposed different methods to embed …
categories it is applied to. Existing works have proposed different methods to embed …
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Open-vocabulary detection is a challenging task due to the requirement of detecting objects
based on class names, including those not encountered during training. Existing methods …
based on class names, including those not encountered during training. Existing methods …
Exploring multi-modal contextual knowledge for open-vocabulary object detection
We explore multi-modal contextual knowledge learned through multi-modal masked
language modeling to provide explicit localization guidance for novel classes in open …
language modeling to provide explicit localization guidance for novel classes in open …