Self-supervised multimodal learning: A survey
Y Zong, O Mac Aodha, T Hospedales - ar** bespoke self-supervised multimodal learning approaches. However, current …
CROMA: Remote sensing representations with contrastive radar-optical masked autoencoders
A vital and rapidly growing application, remote sensing offers vast yet sparsely labeled,
spatially aligned multimodal data; this makes self-supervised learning algorithms invaluable …
spatially aligned multimodal data; this makes self-supervised learning algorithms invaluable …
Spiking tucker fusion transformer for audio-visual zero-shot learning
The spiking neural networks (SNNs) that efficiently encode temporal sequences have shown
great potential in extracting audio-visual joint feature representations. However, coupling …
great potential in extracting audio-visual joint feature representations. However, coupling …