Robustness-aware 3d object detection in autonomous driving: A review and outlook
In the realm of modern autonomous driving, the perception system is indispensable for
accurately assessing the state of the surrounding environment, thereby enabling informed …
accurately assessing the state of the surrounding environment, thereby enabling informed …
Yolov9: Learning what you want to learn using programmable gradient information
Today's deep learning methods focus on how to design the objective functions to make the
prediction as close as possible to the target. Meanwhile, an appropriate neural network …
prediction as close as possible to the target. Meanwhile, an appropriate neural network …
Tip-adapter: Training-free adaption of clip for few-shot classification
Abstract Contrastive Vision-Language Pre-training, known as CLIP, has provided a new
paradigm for learning visual representations using large-scale image-text pairs. It shows …
paradigm for learning visual representations using large-scale image-text pairs. It shows …
Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …
language and 2D image transformers. However, it still remains an open question on how to …
A survey of visual transformers
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …
field of natural language processing (NLP). Inspired by such significant achievements, some …
Pimae: Point cloud and image interactive masked autoencoders for 3d object detection
Masked Autoencoders learn strong visual representations and achieve state-of-the-art
results in several independent modalities, yet very few works have addressed their …
results in several independent modalities, yet very few works have addressed their …
Recent advances and perspectives in deep learning techniques for 3D point cloud data processing
In recent years, deep learning techniques for processing 3D point cloud data have seen
significant advancements, given their unique ability to extract relevant features and handle …
significant advancements, given their unique ability to extract relevant features and handle …
Vision-centric bev perception: A survey
In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant
interest from both industry and academia due to its inherent advantages, such as providing …
interest from both industry and academia due to its inherent advantages, such as providing …
Query-dependent video representation for moment retrieval and highlight detection
Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …
the demand for video understanding is drastically increased. The key objective of MR/HD is …
Calip: Zero-shot enhancement of clip with parameter-free attention
Abstract Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual
representations with promising zero-shot performance. To further improve its downstream …
representations with promising zero-shot performance. To further improve its downstream …