[HTML][HTML] Self-supervised audiovisual representation learning for remote sensing data
Many deep learning approaches make extensive use of backbone networks pretrained on
large datasets like ImageNet, which are then fine-tuned. In remote sensing, the lack of …
large datasets like ImageNet, which are then fine-tuned. In remote sensing, the lack of …
Efficient high-resolution deep learning: A survey
Cameras in modern devices such as smartphones, satellites and medical equipment are
capable of capturing very high resolution images and videos. Such high-resolution data …
capable of capturing very high resolution images and videos. Such high-resolution data …
Repetitive activity counting by sight and sound
This paper strives for repetitive activity counting in videos. Different from existing works,
which all analyze the visual video content only, we incorporate for the first time the …
which all analyze the visual video content only, we incorporate for the first time the …
Audio-visual transformer based crowd counting
Crowd estimation is a very challenging problem. The most recent study tries to exploit
auditory information to aid the visual models, however, the performance is limited due to the …
auditory information to aid the visual models, however, the performance is limited due to the …
GAF-Net: improving the performance of remote sensing image fusion using novel global self and cross attention learning
The notion of self and cross-attention learning has been found to substantially boost the
performance of remote sensing (RS) image fusion. However, while the self-attention models …
performance of remote sensing (RS) image fusion. However, while the self-attention models …
[HTML][HTML] Single-layer vision transformers for more accurate early exits with less overhead
Deploying deep learning models in time-critical applications with limited computational
resources, for instance in edge computing systems and IoT networks, is a challenging task …
resources, for instance in edge computing systems and IoT networks, is a challenging task …
Perceptual score: What data modalities does your model perceive?
Abstract Machine learning advances in the last decade have relied significantly on large-
scale datasets that continue to grow in size. Increasingly, those datasets also contain …
scale datasets that continue to grow in size. Increasingly, those datasets also contain …
Advances in convolution neural networks based crowd counting and density estimation
Automatically estimating the number of people in unconstrained scenes is a crucial yet
challenging task in different real-world applications, including video surveillance, public …
challenging task in different real-world applications, including video surveillance, public …
Multi-exit vision transformer for dynamic inference
Deep neural networks can be converted to multi-exit architectures by inserting early exit
branches after some of their intermediate layers. This allows their inference process to …
branches after some of their intermediate layers. This allows their inference process to …
Scene-adaptive crowd counting method based on meta learning with dual-input network DMNet
Crowd counting is recently becoming a hot research topic, which aims to count the number
of the people in different crowded scenes. Existing methods are mainly based on training …
of the people in different crowded scenes. Existing methods are mainly based on training …