Gsva: Generalized segmentation via multimodal large language models

Z **a, D Han, Y Han, X Pan, S Song… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …

Mosaic: in-memory computing and routing for small-world spike-based neuromorphic systems

T Dalgaty, F Moro, Y Demirağ, A De Pra… - Nature …, 2024 - nature.com
The brain's connectivity is locally dense and globally sparse, forming a small-world graph—
a principle prevalent in the evolution of various species, suggesting a universal solution for …

TransXNet: learning both global and local dynamics with a dual dynamic token mixer for visual recognition

M Lou, HY Zhou, S Yang, Y Yu - arxiv preprint arxiv:2310.19380, 2023 - arxiv.org
Recent studies have integrated convolution into transformers to introduce inductive bias and
improve generalization performance. However, the static nature of conventional convolution …

Efficient diffusion transformer with step-wise dynamic attention mediators

Y Pu, Z **a, J Guo, D Han, Q Li, D Li, Y Yuan… - … on Computer Vision, 2024 - Springer
This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …

Mesam: Multiscale enhanced segment anything model for optical remote sensing images

X Zhou, F Liang, L Chen, H Liu, Q Song… - … on Geoscience and …, 2024 - ieeexplore.ieee.org
Segment anything model (SAM) has been widely applied to various downstream tasks for its
excellent performance and generalization capability. However, SAM exhibits three …

Ct-net: Asymmetric compound branch transformer for medical image segmentation

N Zhang, L Yu, D Zhang, W Wu, S Tian, X Kang, M Li - Neural Networks, 2024 - Elsevier
The Transformer architecture has been widely applied in the field of image segmentation
due to its powerful ability to capture long-range dependencies. However, its ability to capture …

Dat++: Spatially dynamic vision transformer with deformable attention

Z **a, X Pan, S Song, LE Li, G Huang - arxiv preprint arxiv:2309.01430, 2023 - arxiv.org
Transformers have shown superior performance on various vision tasks. Their large
receptive field endows Transformer models with higher representation power than their CNN …

On the role of attention masks and layernorm in transformers

X Wu, A Ajorlou, Y Wang, S Jegelka… - arxiv preprint arxiv …, 2024 - arxiv.org
Self-attention is the key mechanism of transformers, which are the essential building blocks
of modern foundation models. Recent studies have shown that pure self-attention suffers …

MG-ViT: a multi-granularity method for compact and efficient vision transformers

Y Zhang, Y Liu, D Miao, Q Zhang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Vision Transformer (ViT) faces obstacles in wide application due to its huge
computational cost. Almost all existing studies on compressing ViT adopt the manner of …

ViT-MVT: A unified vision transformer network for multiple vision tasks

T **e, K Dai, Z Jiang, R Li, S Mao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In this work, we seek to learn multiple mainstream vision tasks concurrently using a unified
network, which is storage-efficient as numerous networks with task-shared parameters can …