- Academic Search

J Kerr, CM Kim, K Goldberg… - Proceedings of the …, 2023 - openaccess.thecvf.com

Humans describe the physical world using natural language to refer to specific 3D locations
based on a vast range of properties: visual appearance, semantics, abstract associations, or …

บันทึก อ้างอิง อ้างโดย329 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Langsplat: 3d language gaussian splatting

M Qin, W Li, J Zhou, H Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Humans live in a 3D world and commonly use natural language to interact with a 3D scene.
Modeling a 3D language field to support open-ended language queries in 3D has gained …

บันทึก อ้างอิง อ้างโดย126 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Task me anything

J Zhang, W Huang, Z Ma, O Michel, D He… - arxiv preprint arxiv …, 2024 - arxiv.org

Benchmarks for large multimodal language models (MLMs) now serve to simultaneously
assess the general capabilities of models instead of evaluating for a specific capability. As a …

บันทึก อ้างอิง อ้างโดย43 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Going beyond nouns with vision & language models using synthetic data

P Cascante-Bonilla, K Shehada… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large-scale pre-trained Vision & Language (VL) models have shown remarkable
performance in many applications, enabling replacing a fixed set of supported classes with …

บันทึก อ้างอิง อ้างโดย45 บทความที่เกี่ยวข้อง ทั้งหมด 12 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Language-driven grasp detection

AD Vuong, MN Vu, B Huang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Grasp detection is a persistent and intricate challenge with various industrial applications.
Recently many methods and datasets have been proposed to tackle the grasp detection …

บันทึก อ้างอิง อ้างโดย14 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fmgs: Foundation model embedded 3d gaussian splatting for holistic 3d scene understanding

X Zuo, P Samangouei, Y Zhou, Y Di, M Li - International Journal of …, 2024 - Springer

Precisely perceiving the geometric and semantic properties of real-world 3D objects is
crucial for the continued evolution of augmented reality and robotic applications. To this end …

บันทึก อ้างอิง อ้างโดย29 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Prompt-guided zero-shot anomaly action recognition using pretrained deep skeleton features

F Sato, R Hachiuma, T Sekii - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

This study investigates unsupervised anomaly action recognition, which identifies video-
level abnormal-human-behavior events in an unsupervised manner without abnormal …

บันทึก อ้างอิง อ้างโดย33 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Swapmix: Diagnosing and regularizing the over-reliance on visual context in visual question answering

V Gupta, Z Li, A Kortylewski, C Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract While Visual Question Answering (VQA) has progressed rapidly, previous works
raise concerns about robustness of current VQA models. In this work, we study the …

บันทึก อ้างอิง อ้างโดย60 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

บันทึก อ้างอิง อ้างโดย14 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Earthvqa: Towards queryable earth via relational reasoning-based remote sensing visual question answering

J Wang, Z Zheng, Z Chen, A Ma, Y Zhong - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Earth vision research typically focuses on extracting geospatial object locations and
categories but neglects the exploration of relations between objects and comprehensive …

บันทึก อ้างอิง อ้างโดย15 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Simvqa: Exploring simulated environments for visual question answering

Lerf: Language embedded radiance fields

Langsplat: 3d language gaussian splatting

Task me anything

Going beyond nouns with vision & language models using synthetic data

Language-driven grasp detection

Fmgs: Foundation model embedded 3d gaussian splatting for holistic 3d scene understanding

Prompt-guided zero-shot anomaly action recognition using pretrained deep skeleton features

Swapmix: Diagnosing and regularizing the over-reliance on visual context in visual question answering

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

Earthvqa: Towards queryable earth via relational reasoning-based remote sensing visual question answering