Aligning cyber space with physical world: A comprehensive survey on embodied ai
Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
3d-vista: Pre-trained transformer for 3d vision and text alignment
Abstract 3D vision-language grounding (3D-VL) is an emerging field that aims to connect the
3D physical world with natural language, which is crucial for achieving embodied …
3D physical world with natural language, which is crucial for achieving embodied …
Nerflets: Local radiance fields for efficient structure-aware 3d scene representation from 2d supervision
We address efficient and structure-aware 3D scene representation from images. Nerflets are
our key contribution--a set of local neural radiance fields that together represent a scene …
our key contribution--a set of local neural radiance fields that together represent a scene …
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Abstract 3D vision-language (3D-VL) grounding, which aims to align language with 3D
physical environments, stands as a cornerstone in develo** embodied agents. In …
physical environments, stands as a cornerstone in develo** embodied agents. In …
A survey on open-vocabulary detection and segmentation: Past, present, and future
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …
have made tremendous progress in deep learning era. Due to the expensive manual …
Interactive medical image annotation using improved Attention U-net with compound geodesic distance
Y Zhang, J Chen, X Ma, G Wang, UA Bhatti… - Expert systems with …, 2024 - Elsevier
Accurate and massive medical image annotation data is crucial for diagnosis, surgical
planning, and deep learning in the development of medical images. However, creating large …
planning, and deep learning in the development of medical images. However, creating large …
Mask-attention-free transformer for 3d instance segmentation
Recently, transformer-based methods have dominated 3D instance segmentation, where
mask attention is commonly involved. Specifically, object queries are guided by the initial …
mask attention is commonly involved. Specifically, object queries are guided by the initial …
Human-centric scene understanding for 3d large-scale scenarios
Human-centric scene understanding is significant for real-world applications, but it is
extremely challenging due to the existence of diverse human poses and actions, complex …
extremely challenging due to the existence of diverse human poses and actions, complex …