Learning modality-agnostic representation for semantic segmentation from any modalities
Image modality is not perfect as it often fails in certain conditions, eg, night and fast motion.
This significantly limits the robustness and versatility of existing multi-modal (ie, Image+ X) …
This significantly limits the robustness and versatility of existing multi-modal (ie, Image+ X) …
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
We present UniBind a flexible and efficient approach that learns a unified representation
space for seven diverse modalities--images text audio point cloud thermal video and event …
space for seven diverse modalities--images text audio point cloud thermal video and event …
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event
camera sensing. The difficulties in interpreting and annotating event data limit its scalability …
camera sensing. The difficulties in interpreting and annotating event data limit its scalability …
Event camera data dense pre-training
This paper introduces a self-supervised learning framework designed for pre-training neural
networks tailored to dense prediction tasks using event camera data. Our approach utilizes …
networks tailored to dense prediction tasks using event camera data. Our approach utilizes …
Ceia: Clip-based event-image alignment for open-world event-based understanding
We present CEIA, an effective framework for open-world event-based understanding.
Currently training a large event-text model still poses a huge challenge due to the shortage …
Currently training a large event-text model still poses a huge challenge due to the shortage …
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
Event cameras have recently been shown beneficial for practical vision tasks such as action
recognition thanks to their high temporal resolution power efficiency and reduced privacy …
recognition thanks to their high temporal resolution power efficiency and reduced privacy …
Image anything: Towards reasoning-coherent and training-free multi-modal image generation
The multifaceted nature of human perception and comprehension indicates that, when we
think, our body can naturally take any combination of senses, aka, modalities and form a …
think, our body can naturally take any combination of senses, aka, modalities and form a …
EZSR: Event-based Zero-Shot Recognition
This paper studies zero-shot object recognition using event camera data. Guided by CLIP,
which is pre-trained on RGB images, existing approaches achieve zero-shot object …
which is pre-trained on RGB images, existing approaches achieve zero-shot object …
OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All
Research on multi-modal learning dominantly aligns the modalities in a unified space at
training, and only a single one is taken for prediction at inference. However, for a real …
training, and only a single one is taken for prediction at inference. However, for a real …
EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More
Recently, electroencephalography (EEG) signals have been actively incorporated to decode
brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI …
brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI …