[HTML][HTML] Self-supervised audiovisual representation learning for remote sensing data

K Heidler, L Mou, D Hu, P **, G Li, C Gan… - International Journal of …, 2023 - Elsevier
Many deep learning approaches make extensive use of backbone networks pretrained on
large datasets like ImageNet, which are then fine-tuned. In remote sensing, the lack of …

Efficient high-resolution deep learning: A survey

A Bakhtiarnia, Q Zhang, A Iosifidis - ACM Computing Surveys, 2024 - dl.acm.org
Cameras in modern devices such as smartphones, satellites and medical equipment are
capable of capturing very high resolution images and videos. Such high-resolution data …

Repetitive activity counting by sight and sound

Y Zhang, L Shao, CGM Snoek - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
This paper strives for repetitive activity counting in videos. Different from existing works,
which all analyze the visual video content only, we incorporate for the first time the …

Audio-visual transformer based crowd counting

U Sajid, X Chen, H Sajid, T Kim… - Proceedings of the …, 2021 - openaccess.thecvf.com
Crowd estimation is a very challenging problem. The most recent study tries to exploit
auditory information to aid the visual models, however, the performance is limited due to the …

GAF-Net: improving the performance of remote sensing image fusion using novel global self and cross attention learning

A Jha, S Bose, B Banerjee - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
The notion of self and cross-attention learning has been found to substantially boost the
performance of remote sensing (RS) image fusion. However, while the self-attention models …

[HTML][HTML] Single-layer vision transformers for more accurate early exits with less overhead

A Bakhtiarnia, Q Zhang, A Iosifidis - Neural Networks, 2022 - Elsevier
Deploying deep learning models in time-critical applications with limited computational
resources, for instance in edge computing systems and IoT networks, is a challenging task …

Perceptual score: What data modalities does your model perceive?

I Gat, I Schwartz, A Schwing - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Machine learning advances in the last decade have relied significantly on large-
scale datasets that continue to grow in size. Increasingly, those datasets also contain …

Advances in convolution neural networks based crowd counting and density estimation

R Gouiaa, MA Akhloufi, M Shahbazi - Big Data and Cognitive Computing, 2021 - mdpi.com
Automatically estimating the number of people in unconstrained scenes is a crucial yet
challenging task in different real-world applications, including video surveillance, public …

Multi-exit vision transformer for dynamic inference

A Bakhtiarnia, Q Zhang, A Iosifidis - arxiv preprint arxiv:2106.15183, 2021 - arxiv.org
Deep neural networks can be converted to multi-exit architectures by inserting early exit
branches after some of their intermediate layers. This allows their inference process to …

Scene-adaptive crowd counting method based on meta learning with dual-input network DMNet

H Zhao, W Min, J Xu, Q Wang, Y Zou, Q Fu - Frontiers of Computer Science, 2023 - Springer
Crowd counting is recently becoming a hot research topic, which aims to count the number
of the people in different crowded scenes. Existing methods are mainly based on training …