Aligning cyber space with physical world: A comprehensive survey on embodied ai

Y Liu, W Chen, Y Bai, X Liang, G Li, W Gao… - ar**: Segment and edit anything in 3d scenes
M Ye, M Danelljan, F Yu, L Ke - European Conference on Computer …, 2024 - Springer
Abstract The recent Gaussian Splatting achieves high-quality and real-time novel-view
synthesis of the 3D scenes. However, it is solely concentrated on the appearance and …

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

3d-vista: Pre-trained transformer for 3d vision and text alignment

Z Zhu, X Ma, Y Chen, Z Deng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract 3D vision-language grounding (3D-VL) is an emerging field that aims to connect the
3D physical world with natural language, which is crucial for achieving embodied …

Nerflets: Local radiance fields for efficient structure-aware 3d scene representation from 2d supervision

X Zhang, A Kundu, T Funkhouser… - Proceedings of the …, 2023 - openaccess.thecvf.com
We address efficient and structure-aware 3D scene representation from images. Nerflets are
our key contribution--a set of local neural radiance fields that together represent a scene …

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

B Jia, Y Chen, H Yu, Y Wang, X Niu, T Liu, Q Li… - … on Computer Vision, 2024 - Springer
Abstract 3D vision-language (3D-VL) grounding, which aims to align language with 3D
physical environments, stands as a cornerstone in develo** embodied agents. In …

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Interactive medical image annotation using improved Attention U-net with compound geodesic distance

Y Zhang, J Chen, X Ma, G Wang, UA Bhatti… - Expert systems with …, 2024 - Elsevier
Accurate and massive medical image annotation data is crucial for diagnosis, surgical
planning, and deep learning in the development of medical images. However, creating large …

Mask-attention-free transformer for 3d instance segmentation

X Lai, Y Yuan, R Chu, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, transformer-based methods have dominated 3D instance segmentation, where
mask attention is commonly involved. Specifically, object queries are guided by the initial …

Human-centric scene understanding for 3d large-scale scenarios

Y Xu, P Cong, Y Yao, R Chen, Y Hou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human-centric scene understanding is significant for real-world applications, but it is
extremely challenging due to the existence of diverse human poses and actions, complex …