Eda: Explicit text-decoupling and dense alignment for 3d visual grounding
Abstract 3D visual grounding aims to find the object within point clouds mentioned by free-
form natural language descriptions with rich semantic cues. However, existing methods …
form natural language descriptions with rich semantic cues. However, existing methods …
Film: Following instructions in language with modular methods
SY Min, DS Chaplot, P Ravikumar, Y Bisk… - ar** through instruction following
Humans, even at a very early age, can learn visual concepts and understand geometry and
layout through active interaction with the environment, and generalize their compositions to …
layout through active interaction with the environment, and generalize their compositions to …
Episodic memory question answering
Egocentric augmented reality devices such as wearable glasses passively capture visual
data as a human wearer tours a home environment. We envision a scenario wherein the …
data as a human wearer tours a home environment. We envision a scenario wherein the …
Learning 3d dynamic scene representations for robot manipulation
3D scene representation for robot manipulation should capture three key object properties:
permanency--objects that become occluded over time continue to exist; amodal …
permanency--objects that become occluded over time continue to exist; amodal …
Four ways to improve verbo-visual fusion for dense 3d visual grounding
Abstract 3D visual grounding is the task of localizing the object in a 3D scene which is
referred by a description in natural language. With a wide range of applications ranging from …
referred by a description in natural language. With a wide range of applications ranging from …
Visual language navigation: A survey and open challenges
SM Park, YG Kim - Artificial Intelligence Review, 2023 - Springer
With the recent development of deep learning, AI models are widely used in various
domains. AI models show good performance for definite tasks such as image classification …
domains. AI models show good performance for definite tasks such as image classification …
Fast and explicit neural view synthesis
We study the problem of novel view synthesis from sparse source observations of a scene
comprised of 3D objects. We propose a simple yet effective approach that is neither …
comprised of 3D objects. We propose a simple yet effective approach that is neither …
Voxel-informed language grounding
Natural language applied to natural 2D images describes a fundamentally 3D world. We
present the Voxel-informed Language Grounder (VLG), a language grounding model that …
present the Voxel-informed Language Grounder (VLG), a language grounding model that …
Multi-Attribute Interactions Matter for 3D Visual Grounding
Abstract 3D visual grounding aims to localize 3D objects described by free-form language
sentences. Following the detection-then-matching paradigm existing methods mainly focus …
sentences. Following the detection-then-matching paradigm existing methods mainly focus …