- Academic Search

Q Jiang, F Li, Z Zeng, T Ren, S Liu, L Zhang - European Conference on …, 2024 - Springer

We present T-Rex2, a highly practical model for open-set object detection. Previous open-
set object detection methods relying on text prompts effectively encapsulate the abstract …

保存引用被引用次数：22 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org

As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

保存引用被引用次数：24 相关文章所有 7 个版本

[Free GPT-4]

[PDF] thecvf.com

Visual in-context prompting

F Li, Q Jiang, H Zhang, T Ren, S Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

In-context prompting in large language models (LLMs) has become a prevalent approach to
improve zero-shot capabilities but this idea is less explored in the vision domain. Existing …

保存引用被引用次数：26 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Language-conditioned detection transformer

JH Cho, P Krähenbühl - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

We present a new open-vocabulary detection framework. Our framework uses both image-
level labels and detailed detection annotations when available. Our framework proceeds in …

保存引用被引用次数：4 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

An end-to-end real-world camera imaging pipeline

K Xu, Z Ma, L Xu, G He, Y Li, W Yu, T Han… - Proceedings of the 32nd …, 2024 - dl.acm.org

Recent advances in neural camera imaging pipelines have demonstrated notable progress.
Nevertheless, the real-world imaging pipeline still faces challenges including the lack of joint …

保存引用被引用次数：7 相关文章所有 5 个版本

[Free GPT-4]

[PDF] arxiv.org

Mobilevlm: A vision-language model for better intra-and inter-ui understanding

Q Wu, W Xu, W Liu, T Tan, J Liu, A Li, J Luan… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, mobile AI agents based on VLMs have been gaining increasing attention. These
works typically utilize VLM as a foundation, fine-tuning it with instruction-based mobile …

保存引用被引用次数：6 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Dino-x: A unified vision model for open-world object detection and understanding

T Ren, Y Chen, Q Jiang, Z Zeng, Y **ong, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we introduce DINO-X, which is a unified object-centric vision model developed
by IDEA Research with the best open-world object detection performance to date. DINO-X …

保存引用被引用次数：3 相关文章 HTML 版

[Free GPT-4]

[PDF] thecvf.com

OVMR: Open-Vocabulary Recognition with Multi-Modal References

Z Ma, S Zhang, L Wei, Q Tian - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

The challenge of open-vocabulary recognition lies in the model has no clue of new
categories it is applied to. Existing works have proposed different methods to embed …

保存引用被引用次数：3 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

H Wang, P Ren, Z Jie, X Dong, C Feng, Y Qian… - arxiv preprint arxiv …, 2024 - arxiv.org

Open-vocabulary detection is a challenging task due to the requirement of detecting objects
based on class names, including those not encountered during training. Existing methods …

保存引用被引用次数：2 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Exploring multi-modal contextual knowledge for open-vocabulary object detection

Y Xu, M Zhang, X Yang, C Xu - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org

We explore multi-modal contextual knowledge learned through multi-modal masked
language modeling to provide explicit localization guidance for novel classes in open …

保存引用被引用次数：4 相关文章所有 2 个版本

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Multi-modal queried object detection in the wild

T-rex2: Towards generic object detection via text-visual prompt synergy

A survey on open-vocabulary detection and segmentation: Past, present, and future

Visual in-context prompting

Language-conditioned detection transformer

An end-to-end real-world camera imaging pipeline

Mobilevlm: A vision-language model for better intra-and inter-ui understanding

Dino-x: A unified vision model for open-world object detection and understanding

OVMR: Open-Vocabulary Recognition with Multi-Modal References

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Exploring multi-modal contextual knowledge for open-vocabulary object detection