Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

Token contrast for weakly-supervised semantic segmentation

L Ru, H Zheng, Y Zhan, B Du - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels
typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the …

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

Adamv-moe: Adaptive multi-task vision mixture-of-experts

T Chen, X Chen, X Du, A Rashwan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Sparsely activated Mixture-of-Experts (MoE) is becoming a promising paradigm for
multi-task learning (MTL). Instead of compressing multiple tasks' knowledge into a single …

Masked relation learning for deepfake detection

Z Yang, J Liang, Y Xu, XY Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
DeepFake detection aims to differentiate falsified faces from real ones. Most approaches
formulate it as a binary classification problem by solely mining the local artifacts and …

Ppt: token-pruned pose transformer for monocular and multi-view human pose estimation

H Ma, Z Wang, Y Chen, D Kong, L Chen, X Liu… - … on Computer Vision, 2022 - Springer
Recently, the vision transformer and its variants have played an increasingly important role
in both monocular and multi-view human pose estimation. Considering image patches as …

On filtrations of A (V)

J Liu - arxiv preprint arxiv:2103.08090, 2021 - arxiv.org
The filtrations on Zhu's algebra $ A (V) $ and bimodules $ A (M) $ are studied. As an
application, we prove that $ A (V) $ is noetherian when $ V $ is strongly finitely generated …

Sparse moe as the new dropout: Scaling dense and self-slimmable transformers

T Chen, Z Zhang, A Jaiswal, S Liu, Z Wang - arxiv preprint arxiv …, 2023 - arxiv.org
Despite their remarkable achievement, gigantic transformers encounter significant
drawbacks, including exorbitant computational and memory footprints during training, as …

Shvit: Single-head vision transformer with memory efficient macro design

S Yun, Y Ro - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Abstract Recently efficient Vision Transformers have shown great performance with low
latency on resource-constrained devices. Conventionally they use 4x4 patch embeddings …

The lighter the better: rethinking transformers in medical image segmentation through adaptive pruning

X Lin, L Yu, KT Cheng, Z Yan - IEEE Transactions on Medical …, 2023 - ieeexplore.ieee.org
Vision transformers have recently set off a new wave in the field of medical image analysis
due to their remarkable performance on various computer vision tasks. However, recent …