Vmamba: Visual state space model

Y Liu, Y Tian, Y Zhao, H Yu, L **e… - Advances in neural …, 2025 - proceedings.neurips.cc
Designing computationally efficient network architectures remains an ongoing necessity in
computer vision. In this paper, we adapt Mamba, a state-space language model, into …

Large selective kernel network for remote sensing object detection

Y Li, Q Hou, Z Zheng, MM Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …

Poly kernel inception network for remote sensing detection

X Cai, Q Lai, Y Wang, W Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Object detection in remote sensing images (RSIs) often suffers from several increasing
challenges including the large variation in object scales and the diverse-ranging context …

Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition

X Ding, Y Zhang, Y Ge, S Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large-kernel convolutional neural networks (ConvNets) have recently received extensive
research attention but two unresolved and critical issues demand further investigation. 1) …

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Inceptionnext: When inception meets convnext

W Yu, P Zhou, S Yan, X Wang - Proceedings of the IEEE/cvf …, 2024 - openaccess.thecvf.com
Inspired by the long-range modeling ability of ViTs large-kernel convolutions are widely
studied and adopted recently to enlarge the receptive field and improve model performance …

Large separable kernel attention: Rethinking the large kernel attention design in cnn

KW Lau, LM Po, YAU Rehman - Expert Systems with Applications, 2024 - Elsevier
Abstract Visual Attention Networks (VAN) with Large Kernel Attention (LKA) modules have
been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs) …

Deep-learning-based semantic segmentation of remote sensing images: A survey

L Huang, B Jiang, S Lv, Y Liu… - IEEE Journal of Selected …, 2023 - ieeexplore.ieee.org
Semantic segmentation of remote sensing images (SSRSIs), which aims to assign a
category to each pixel in remote sensing images, plays a vital role in a broad range of …

Metaformer baselines for vision

W Yu, C Si, P Zhou, M Luo, Y Zhou… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
MetaFormer, the abstracted architecture of Transformer, has been found to play a significant
role in achieving competitive performance. In this paper, we further explore the capacity of …

Conv2former: A simple transformer-style convnet for visual recognition

Q Hou, CZ Lu, MM Cheng… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
Vision Transformers have been the most popular network architecture in visual recognition
recently due to the strong ability of encode global information. However, its high …