X-Align: Cross-modal cross-view alignment for bird's-eye-view segmentation

S Borse, M Klingner, VR Kumar, H Cai… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Bird's-eye-view (BEV) grid is a common representation for the perception of road
components, eg, drivable area, in autonomous driving. Most existing approaches rely on …

Mamo: Leveraging memory and attention for monocular video depth estimation

R Yasarla, H Cai, J Jeong, Y Shi… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose MAMo, a novel memory and attention framework for monocular video depth
estimation. MAMo can augment and improve any single-image depth estimation networks …

4d panoptic segmentation as invariant and equivariant field prediction

M Zhu, S Han, H Cai, S Borse… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we develop rotation-equivariant neural networks for 4D panoptic
segmentation. 4D panoptic segmentation is a benchmark task for autonomous driving that …

Joint-Task Regularization for Partially Labeled Multi-Task Learning

K Nishi, J Kim, W Li, H Pfister - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Multi-task learning has become increasingly popular in the machine learning field but its
practicality is hindered by the need for large labeled datasets. Most multi-task learning …

PosSAM: Panoptic open-vocabulary segment anything

V VS, S Borse, H Park, D Das, V Patel, M Hayat… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively
unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP …

SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations

JM Lin, J Jeong, H Cai, R Garrepalli… - Proceedings of the …, 2024 - openaccess.thecvf.com
Optical flow estimation is crucial to a variety of vision tasks. Despite substantial recent
advancements achieving real-time on-device optical flow estimation remains a complex …

Region-Aware Distribution Contrast: A Novel Approach to Multi-task Partially Supervised Learning

M Li, T Li, G Wang, P Wang, Y Yang, J Zou - European Conference on …, 2024 - Springer
In this study, we address the intricate challenge of multi-task dense prediction,
encompassing tasks such as semantic segmentation, depth estimation, and surface normal …

CMGFA: A BEV Segmentation Model Based on Cross-Modal Group-Mix Attention Feature Aggregator

X Kuang, R Niu, C Hua, C Jiang, H Zhu… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
Bird's eye view (BEV) segmentation map is a recent development in autonomous driving that
provides effective environmental information, such as drivable areas and lane dividers. Most …

[PDF][PDF] Spatiotemporal Vision Transformer for Weakly Supervised Dense Prediction of Dynamic Brain Maps

B Kazemivash, A Iraji, S Plis, V Calhoun - 2024 - bmva-archive.org.uk
Dynamic brain maps are crucial for comprehending brain dynamism, involving the study of
rapid changes in brain activity across different regions over time. However, computational …

[PDF][PDF] Algorithm-hardware co-optimization for Transformer Neural Networks on the edge

I Knunyants - research.tue.nl
Transformer neural network architecture has revolutionized the deep learning field,
becoming the central architecture used in most relevant tasks. However, as the size of state …