Transface: Calibrating transformer training for face recognition from a data-centric perspective
Abstract Vision Transformers (ViTs) have demonstrated powerful representation ability in
various visual tasks thanks to their intrinsic data-hungry nature. However, we unexpectedly …
various visual tasks thanks to their intrinsic data-hungry nature. However, we unexpectedly …
Egformer: Equirectangular geometry-biased transformer for 360 depth estimation
Estimating the depths of equirectangular (ie, 360) images (EIs) is challenging given the
distorted 180 x 360 field-of-view, which is hard to be addressed via convolutional neural …
distorted 180 x 360 field-of-view, which is hard to be addressed via convolutional neural …
Dexterous Grasp Transformer
In this work we propose a novel discriminative framework for dexterous grasp generation
named Dexterous Grasp TRansformer (DGTR) capable of predicting a diverse set of feasible …
named Dexterous Grasp TRansformer (DGTR) capable of predicting a diverse set of feasible …
Human-centric transformer for domain adaptive action recognition
We study the domain adaptation task for action recognition, namely domain adaptive action
recognition, which aims to effectively transfer action recognition power from a label-sufficient …
recognition, which aims to effectively transfer action recognition power from a label-sufficient …
Vision transformer: To discover the “four secrets” of image patches
T Zhou, Y Niu, H Lu, C Peng, Y Guo, H Zhou - Information Fusion, 2024 - Elsevier
Abstract Vision Transformer (ViT) is widely used in the field of computer vision, in ViT, there
are four main steps, which are “four secrets”, such as patch division, token selection, position …
are four main steps, which are “four secrets”, such as patch division, token selection, position …
Prompt guided transformer for multi-task dense prediction
Task-conditional architecture offers advantage in parameter efficiency but falls short in
performance compared to state-of-the-art multi-decoder methods. How to trade off …
performance compared to state-of-the-art multi-decoder methods. How to trade off …
[PDF][PDF] Bidirectional Dilation Transformer for Multispectral and Hyperspectral Image Fusion.
Transformer-based methods have proven to be ef-1 fective in achieving long-distance
modeling, cap-2 turing the spatial and spectral information, and 3 exhibiting strong inductive …
modeling, cap-2 turing the spatial and spectral information, and 3 exhibiting strong inductive …
Factorization vision transformer: Modeling long-range dependency with local window cost
Transformers have astounding representational power but typically consume considerable
computation which is quadratic with image resolution. The prevailing Swin transformer …
computation which is quadratic with image resolution. The prevailing Swin transformer …
MG-ViT: a multi-granularity method for compact and efficient vision transformers
Abstract Vision Transformer (ViT) faces obstacles in wide application due to its huge
computational cost. Almost all existing studies on compressing ViT adopt the manner of …
computational cost. Almost all existing studies on compressing ViT adopt the manner of …
A model for detecting abnormal elevator passenger behavior based on video classification
J Lei, W Sun, Y Fang, N Ye, S Yang, J Wu - Electronics, 2024 - mdpi.com
In the task of human behavior detection, video classification based on deep learning has
become a prevalent technique. The existing models are limited due to an inadequate …
become a prevalent technique. The existing models are limited due to an inadequate …