T-rex2: Towards generic object detection via text-visual prompt synergy

Q Jiang, F Li, Z Zeng, T Ren, S Liu, L Zhang - European Conference on …, 2024 - Springer
We present T-Rex2, a highly practical model for open-set object detection. Previous open-
set object detection methods relying on text prompts effectively encapsulate the abstract …

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Visual in-context prompting

F Li, Q Jiang, H Zhang, T Ren, S Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In-context prompting in large language models (LLMs) has become a prevalent approach to
improve zero-shot capabilities but this idea is less explored in the vision domain. Existing …

Language-conditioned detection transformer

JH Cho, P Krähenbühl - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We present a new open-vocabulary detection framework. Our framework uses both image-
level labels and detailed detection annotations when available. Our framework proceeds in …

An end-to-end real-world camera imaging pipeline

K Xu, Z Ma, L Xu, G He, Y Li, W Yu, T Han… - Proceedings of the 32nd …, 2024 - dl.acm.org
Recent advances in neural camera imaging pipelines have demonstrated notable progress.
Nevertheless, the real-world imaging pipeline still faces challenges including the lack of joint …

Mobilevlm: A vision-language model for better intra-and inter-ui understanding

Q Wu, W Xu, W Liu, T Tan, J Liu, A Li, J Luan… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, mobile AI agents based on VLMs have been gaining increasing attention. These
works typically utilize VLM as a foundation, fine-tuning it with instruction-based mobile …

Dino-x: A unified vision model for open-world object detection and understanding

T Ren, Y Chen, Q Jiang, Z Zeng, Y **ong, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we introduce DINO-X, which is a unified object-centric vision model developed
by IDEA Research with the best open-world object detection performance to date. DINO-X …

OVMR: Open-Vocabulary Recognition with Multi-Modal References

Z Ma, S Zhang, L Wei, Q Tian - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The challenge of open-vocabulary recognition lies in the model has no clue of new
categories it is applied to. Existing works have proposed different methods to embed …

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

H Wang, P Ren, Z Jie, X Dong, C Feng, Y Qian… - arxiv preprint arxiv …, 2024 - arxiv.org
Open-vocabulary detection is a challenging task due to the requirement of detecting objects
based on class names, including those not encountered during training. Existing methods …

Exploring multi-modal contextual knowledge for open-vocabulary object detection

Y Xu, M Zhang, X Yang, C Xu - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
We explore multi-modal contextual knowledge learned through multi-modal masked
language modeling to provide explicit localization guidance for novel classes in open …