Distilled feature fields enable few-shot language-guided manipulation

W Shen, G Yang, A Yu, J Wong, LP Kaelbling… - arxiv preprint arxiv …, 2023 - arxiv.org
Self-supervised and language-supervised image models contain rich knowledge of the
world that is important for generalization. Many robotic tasks, however, require a detailed …

Neural volumetric memory for visual locomotion control

R Yang, G Yang, X Wang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Legged robots have the potential to expand the reach of autonomy beyond paved roads. In
this work, we consider the difficult problem of locomotion on challenging terrains using a …

Simple-bev: What really matters for multi-sensor bev perception?

AW Harley, Z Fang, J Li, R Ambrus… - … on Robotics and …, 2023 - ieeexplore.ieee.org
Building 3D perception systems for autonomous vehicles that do not rely on high-density
LiDAR is a critical research problem because of the expense of LiDAR systems compared to …

Collossl: Collaborative self-supervised learning for human activity recognition

Y Jain, CI Tang, C Min, F Kawsar… - Proceedings of the ACM on …, 2022 - dl.acm.org
A major bottleneck in training robust Human-Activity Recognition models (HAR) is the need
for large-scale labeled sensor datasets. Because labeling large amounts of sensor data is …

Act3d: 3d feature field transformers for multi-task robotic manipulation

T Gervet, Z **an, N Gkanatsios… - arxiv preprint arxiv …, 2023 - arxiv.org
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …

Lifting Multi-View Detection and Tracking to the Bird's Eye View

T Teepe, P Wolters, J Gilg… - Proceedings of the …, 2024 - openaccess.thecvf.com
Taking advantage of multi-view aggregation presents a promising solution to tackle
challenges such as occlusion and missed detection in multi-object tracking and detection …

Learning 3d dynamic scene representations for robot manipulation

Z Xu, Z He, J Wu, S Song - arxiv preprint arxiv:2011.01968, 2020 - arxiv.org
3D scene representation for robot manipulation should capture three key object properties:
permanency--objects that become occluded over time continue to exist; amodal …

Video autoencoder: self-supervised disentanglement of static 3d structure and motion

Z Lai, S Liu, AA Efros, X Wang - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Abstract We present Video Autoencoder for learning disentangled representations of 3D
structure and camera pose from videos in a self-supervised manner. Relying on temporal …

Regulating intermediate 3d features for vision-centric autonomous driving

J Xu, L Peng, H Cheng, L **a, Q Zhou… - Proceedings of the …, 2024 - ojs.aaai.org
Multi-camera perception tasks have gained significant attention in the field of autonomous
driving. However, existing frameworks based on Lift-Splat-Shoot (LSS) in the multi-camera …

A neural rendering framework for free-viewpoint relighting

Z Chen, A Chen, G Zhang, C Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com
We present a novel Relightable Neural Renderer (RNR) for simultaneous view synthesis
and relighting using multi-view image inputs. Existing neural rendering (NR) does not …