Adventures in data analysis: A systematic review of Deep Learning techniques for pattern recognition in cyber-physical-social systems

Z Amiri, A Heidari, NJ Navimipour, M Unal… - Multimedia Tools and …, 2024 - Springer
Abstract Machine Learning (ML) and Deep Learning (DL) have achieved high success in
many textual, auditory, medical imaging, and visual recognition patterns. Concerning the …

Are we ready for a new paradigm shift? a survey on visual deep mlp

R Liu, Y Li, L Tao, D Liang, HT Zheng - Patterns, 2022 - cell.com
Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …

Scaling & shifting your features: A new baseline for efficient model tuning

D Lian, D Zhou, J Feng, X Wang - Advances in Neural …, 2022 - proceedings.neurips.cc
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-
tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers …

Davit: Dual attention vision transformers

M Ding, B **ao, N Codella, P Luo, J Wang… - European conference on …, 2022 - Springer
In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective
vision transformer architecture that is able to capture global context while maintaining …

Metaformer baselines for vision

W Yu, C Si, P Zhou, M Luo, Y Zhou… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
MetaFormer, the abstracted architecture of Transformer, has been found to play a significant
role in achieving competitive performance. In this paper, we further explore the capacity of …

Metaformer is actually what you need for vision

W Yu, M Luo, P Zhou, C Si, Y Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com
Transformers have shown great potential in computer vision tasks. A common belief is their
attention-based token mixer module contributes most to their competence. However, recent …

Centralized feature pyramid for object detection

Y Quan, D Zhang, L Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The visual feature pyramid has shown its superiority in both effectiveness and efficiency in a
variety of applications. However, current methods overly focus on inter-layer feature …

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

Delivering arbitrary-modal semantic segmentation

J Zhang, R Liu, H Shi, K Yang, S Reiß… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multimodal fusion can make semantic segmentation more robust. However, fusing an
arbitrary number of modalities remains underexplored. To delve into this problem, we create …

Conv2former: A simple transformer-style convnet for visual recognition

Q Hou, CZ Lu, MM Cheng… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
Vision Transformers have been the most popular network architecture in visual recognition
recently due to the strong ability of encode global information. However, its high …