- Academic Search

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Enregistrer Citer Cité 124 fois Autres articles Les 10 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

W Zhang, M Cai, T Zhang, Y Zhuang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Multimodal large language models (MLLMs) have demonstrated remarkable success in
vision and visual-language tasks within the natural image domain. Owing to the significant …

Enregistrer Citer Cité 80 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Mmro: Are multimodal llms eligible as the brain for in-home robotics?

J Li, Y Zhu, Z Xu, J Gu, M Zhu, X Liu, N Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

It is fundamentally challenging for robots to serve as useful assistants in human
environments because this requires addressing a spectrum of sub-problems across robotics …

Enregistrer Citer Cité 7 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

3d-gres: Generalized 3d referring expression segmentation

C Wu, Y Liu, J Ji, Y Ma, H Wang, G Luo… - Proceedings of the …, 2024 - dl.acm.org

3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific
instance within a 3D space based on a natural language description. However, current …

Enregistrer Citer Cité 4 fois Autres articles Les 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Dino-x: A unified vision model for open-world object detection and understanding

T Ren, Y Chen, Q Jiang, Z Zeng, Y **ong, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we introduce DINO-X, which is a unified object-centric vision model developed
by IDEA Research with the best open-world object detection performance to date. DINO-X …

Enregistrer Citer Cité 3 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Learning visual grounding from generative vision and language model

S Wang, D Kim, A Taalimi, C Sun, W Kuo - arxiv preprint arxiv:2407.14563, 2024 - arxiv.org

Visual grounding tasks aim to localize image regions based on natural language references.
In this work, we explore whether generative VLMs predominantly trained on image-text data …

Enregistrer Citer Cité 3 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Auto cherry-picker: Learning from high-quality generative data driven by language

Y Chen, X Li, Y Li, Y Zeng, J Wu, X Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion-based models have shown great potential in generating high-quality images with
various layouts, which can benefit downstream perception tasks. However, a fully automatic …

Enregistrer Citer Cité 2 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

RoboCup@ Home 2024 OPL winner NimbRo: Anthropomorphic service robots using foundation models for perception and planning

R Memmesheimer, J Nogga, B Pätzold… - arxiv preprint arxiv …, 2024 - arxiv.org

We present the approaches and contributions of the winning team NimbRo@ Home at the
RoboCup@ Home 2024 competition in the Open Platform League held in Eindhoven, NL …

Enregistrer Citer Cité 2 fois Autres articles Les 4 versions Free GPT-4 Version HTML

CamoEnv: Transferable and environment-consistent adversarial camouflage in autonomous driving

Z Zhu, X Yang, H Su, S Zheng - Pattern Recognition Letters, 2025 - Elsevier

Adversarial camouflage has garnered significant attention in the security literature on
autonomous driving. The ability to adapt to various angles makes adversarial camouflage …

Enregistrer Citer Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

DynamicEarth: How Far are We from Open-Vocabulary Change Detection?

K Li, X Cao, Y Deng, C Pang, Z **n, D Meng… - arxiv preprint arxiv …, 2025 - arxiv.org

Monitoring Earth's evolving land covers requires methods capable of detecting changes
across a wide range of categories and contexts. Existing change detection methods are …

Enregistrer Citer Autres articles Les 2 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

An open and comprehensive pipeline for unified object grounding and detection

Towards open vocabulary learning: A survey

Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

Mmro: Are multimodal llms eligible as the brain for in-home robotics?

3d-gres: Generalized 3d referring expression segmentation

Dino-x: A unified vision model for open-world object detection and understanding

Learning visual grounding from generative vision and language model

Auto cherry-picker: Learning from high-quality generative data driven by language

RoboCup@ Home 2024 OPL winner NimbRo: Anthropomorphic service robots using foundation models for perception and planning

CamoEnv: Transferable and environment-consistent adversarial camouflage in autonomous driving

DynamicEarth: How Far are We from Open-Vocabulary Change Detection?