Intriguing properties of vision transformers
Vision transformers (ViT) have demonstrated impressive performance across numerous
machine vision tasks. These models are based on multi-head self-attention mechanisms that …
machine vision tasks. These models are based on multi-head self-attention mechanisms that …
Audio–visual segmentation
We propose to explore a new problem called audio-visual segmentation (AVS), in which the
goal is to output a pixel-level map of the object (s) that produce sound at the time of the …
goal is to output a pixel-level map of the object (s) that produce sound at the time of the …
I can find you! boundary-guided separated attention network for camouflaged object detection
Can you find me? By simulating how humans to discover the so-called'perfectly'-
camouflaged object, we present a novel boundary-guided separated attention network (call …
camouflaged object, we present a novel boundary-guided separated attention network (call …
Deep gradient learning for efficient camouflaged object detection
This paper introduces deep gradient network (DGNet), a novel deep framework that exploits
object gradient supervision for camouflaged object detection (COD). It decouples the task …
object gradient supervision for camouflaged object detection (COD). It decouples the task …
Avsegformer: Audio-visual segmentation with transformer
Audio-visual segmentation (AVS) aims to locate and segment the sounding objects in a
given video, which demands audio-driven pixel-level scene understanding. The existing …
given video, which demands audio-driven pixel-level scene understanding. The existing …
SwinE-Net: Hybrid deep learning approach to novel polyp segmentation using convolutional neural network and Swin Transformer
Prevention of colorectal cancer (CRC) by inspecting and removing colorectal polyps has
become a global health priority because CRC is one of the most frequent cancers in the …
become a global health priority because CRC is one of the most frequent cancers in the …
HRTransNet: HRFormer-driven two-modality salient object detection
The High-Resolution Transformer (HRFormer) can maintain high-resolution representation
and share global receptive fields. It is friendly towards salient object detection (SOD) in …
and share global receptive fields. It is friendly towards salient object detection (SOD) in …
Catr: Combinatorial-dependence audio-queried transformer for audio-visual video segmentation
Audio-visual video segmentation (AVVS) aims to generate pixel-level maps of sound-
producing objects within image frames and ensure the maps faithfully adheres to the given …
producing objects within image frames and ensure the maps faithfully adheres to the given …
On improving adversarial transferability of vision transformers
Vision transformers (ViTs) process input images as sequences of patches via self-attention;
a radically different architecture than convolutional neural networks (CNNs). This makes it …
a radically different architecture than convolutional neural networks (CNNs). This makes it …
Boundary-guided network for camouflaged object detection
T Chen, J **ao, X Hu, G Zhang, S Wang - Knowledge-based systems, 2022 - Elsevier
Compared with the traditional object segmentation/detection, camouflaged object detection
is much more difficult due to the indefinable boundaries and high intrinsic similarities …
is much more difficult due to the indefinable boundaries and high intrinsic similarities …