Intriguing properties of vision transformers

MM Naseer, K Ranasinghe, SH Khan… - Advances in …, 2021 - proceedings.neurips.cc
Vision transformers (ViT) have demonstrated impressive performance across numerous
machine vision tasks. These models are based on multi-head self-attention mechanisms that …

Audio–visual segmentation

J Zhou, J Wang, J Zhang, W Sun, J Zhang… - … on Computer Vision, 2022 - Springer
We propose to explore a new problem called audio-visual segmentation (AVS), in which the
goal is to output a pixel-level map of the object (s) that produce sound at the time of the …

I can find you! boundary-guided separated attention network for camouflaged object detection

H Zhu, P Li, H **e, X Yan, D Liang, D Chen… - Proceedings of the …, 2022 - ojs.aaai.org
Can you find me? By simulating how humans to discover the so-called'perfectly'-
camouflaged object, we present a novel boundary-guided separated attention network (call …

Deep gradient learning for efficient camouflaged object detection

GP Ji, DP Fan, YC Chou, D Dai, A Liniger… - Machine Intelligence …, 2023 - Springer
This paper introduces deep gradient network (DGNet), a novel deep framework that exploits
object gradient supervision for camouflaged object detection (COD). It decouples the task …

Avsegformer: Audio-visual segmentation with transformer

S Gao, Z Chen, G Chen, W Wang, T Lu - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Audio-visual segmentation (AVS) aims to locate and segment the sounding objects in a
given video, which demands audio-driven pixel-level scene understanding. The existing …

SwinE-Net: Hybrid deep learning approach to novel polyp segmentation using convolutional neural network and Swin Transformer

KB Park, JY Lee - Journal of Computational Design and …, 2022 - academic.oup.com
Prevention of colorectal cancer (CRC) by inspecting and removing colorectal polyps has
become a global health priority because CRC is one of the most frequent cancers in the …

HRTransNet: HRFormer-driven two-modality salient object detection

B Tang, Z Liu, Y Tan, Q He - … on Circuits and Systems for Video …, 2022 - ieeexplore.ieee.org
The High-Resolution Transformer (HRFormer) can maintain high-resolution representation
and share global receptive fields. It is friendly towards salient object detection (SOD) in …

Catr: Combinatorial-dependence audio-queried transformer for audio-visual video segmentation

K Li, Z Yang, L Chen, Y Yang, J **ao - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Audio-visual video segmentation (AVVS) aims to generate pixel-level maps of sound-
producing objects within image frames and ensure the maps faithfully adheres to the given …

On improving adversarial transferability of vision transformers

M Naseer, K Ranasinghe, S Khan, FS Khan… - arxiv preprint arxiv …, 2021 - arxiv.org
Vision transformers (ViTs) process input images as sequences of patches via self-attention;
a radically different architecture than convolutional neural networks (CNNs). This makes it …

Boundary-guided network for camouflaged object detection

T Chen, J **ao, X Hu, G Zhang, S Wang - Knowledge-based systems, 2022 - Elsevier
Compared with the traditional object segmentation/detection, camouflaged object detection
is much more difficult due to the indefinable boundaries and high intrinsic similarities …