Learning modality-agnostic representation for semantic segmentation from any modalities

X Zheng, Y Lyu, L Wang - European Conference on Computer Vision, 2024 - Springer
Image modality is not perfect as it often fails in certain conditions, eg, night and fast motion.
This significantly limits the robustness and versatility of existing multi-modal (ie, Image+ X) …

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All

Y Lyu, X Zheng, J Zhou, L Wang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We present UniBind a flexible and efficient approach that learns a unified representation
space for seven diverse modalities--images text audio point cloud thermal video and event …

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

L Kong, Y Liu, LX Ng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event
camera sensing. The difficulties in interpreting and annotating event data limit its scalability …

Event camera data dense pre-training

Y Yang, L Pan, L Liu - European Conference on Computer Vision, 2024 - Springer
This paper introduces a self-supervised learning framework designed for pre-training neural
networks tailored to dense prediction tasks using event camera data. Our approach utilizes …

Ceia: Clip-based event-image alignment for open-world event-based understanding

W Xu, W Weng, Y Zhang, Z **ong - arxiv preprint arxiv:2407.06611, 2024 - arxiv.org
We present CEIA, an effective framework for open-world event-based understanding.
Currently training a large event-text model still poses a huge challenge due to the shortage …

ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More

J Zhou, X Zheng, Y Lyu, L Wang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Event cameras have recently been shown beneficial for practical vision tasks such as action
recognition thanks to their high temporal resolution power efficiency and reduced privacy …

Image anything: Towards reasoning-coherent and training-free multi-modal image generation

Y Lyu, X Zheng, L Wang - arxiv preprint arxiv:2401.17664, 2024 - arxiv.org
The multifaceted nature of human perception and comprehension indicates that, when we
think, our body can naturally take any combination of senses, aka, modalities and form a …

EZSR: Event-based Zero-Shot Recognition

Y Yang, L Pan, D Li, L Liu - arxiv preprint arxiv:2407.21616, 2024 - arxiv.org
This paper studies zero-shot object recognition using event camera data. Guided by CLIP,
which is pre-trained on RGB images, existing approaches achieve zero-shot object …

OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All

Y Lyu, X Zheng, D Kim, L Wang - arxiv preprint arxiv:2405.16108, 2024 - arxiv.org
Research on multi-modal learning dominantly aligns the modalities in a unified space at
training, and only a single one is taken for prediction at inference. However, for a real …

EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More

X Zheng, L Wang, K Chen, Y Lyu, J Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, electroencephalography (EEG) signals have been actively incorporated to decode
brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI …