Pruning self-attentions into convolutional layers in single path

H He, J Cai, J Liu, Z Pan, J Zhang… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
Vision Transformers (ViTs) have achieved impressive performance over various computer
vision tasks. However, modeling global correlations with multi-head self-attention (MSA) …

Towards flexible inductive bias via progressive reparameterization scheduling

Y Lee, G Lee, K Ryoo, H Go, J Park, S Kim - European Conference on …, 2022 - Springer
There are two de facto standard architectures in recent computer vision: Convolutional
Neural Networks (CNNs) and Vision Transformers (ViTs). Strong inductive biases of …

Overparametrization, architecture and dynamics of deep neural networks: from theory to practice

S d'Ascoli - 2022 - theses.hal.science
Deep learning has become the cornerstone of artificial intelligence, and has fueled
breakthroughs in a number of fields. Yet, the key reasons underpinning the success of deep …

CSA-BERT: Video Question Answering

K Jenni, M Srinivas, R Sannapu… - 2023 IEEE Statistical …, 2023 - ieeexplore.ieee.org
Convolutional networks are a key component of many computer vision applications.
However, convolutions have a serious flaw. It only works in a small area, hence it lacks …

Towards Efficient Training and Inference of Large Transformer Models

H He - 2024 - bridges.monash.edu
Transformers have revolutionized modern applications but are costly as model sizes grow.
This thesis targets efficient training and inference of large Transformer models. We first …

Adaptive Attention Link-based Regularization for Vision Transformers

H **, J Choi - arxiv preprint arxiv:2211.13852, 2022 - arxiv.org
Although transformer networks are recently employed in various vision tasks with
outperforming performance, extensive training data and a lengthy training time are required …

[PDF][PDF] Tackling the 2021 Algonauts Challenge with Semi-Supervised Networks & Bayesian Optimization

RT Lange - algonauts.csail.mit.edu
Deep neural networks have been widely adopted as state-of-the-art models of the visual
ventral stream (eg Cadieu et al., 2014; Yamins and DiCarlo, 2016; Cichy et al., 2016). Most …