[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer
A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

Pure transformers are powerful graph learners

J Kim, D Nguyen, S Min, S Cho… - Advances in Neural …, 2022 - proceedings.neurips.cc
We show that standard Transformers without graph-specific modifications can lead to
promising results in graph learning both in theory and practice. Given a graph, we simply …

Relora: High-rank training through low-rank updates

V Lialin, N Shivagunde, S Muckatira… - arxiv preprint arxiv …, 2023 - arxiv.org
Despite the dominance and effectiveness of scaling, resulting in large networks with
hundreds of billions of parameters, the necessity to train overparameterized models remains …

Vitcod: Vision transformer acceleration via dedicated algorithm and accelerator co-design

H You, Z Sun, H Shi, Z Yu, Y Zhao… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision
tasks. However, ViTs' self-attention module is still arguably a major bottleneck, limiting their …

Uniform memory retrieval with larger capacity for modern hopfield models

D Wu, JYC Hu, TY Hsiao, H Liu - arxiv preprint arxiv:2404.03827, 2024 - arxiv.org
We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed
$\mathtt {U\text {-} Hop} $, with enhanced memory capacity. Our key contribution is a …

BERT-based deep spatial-temporal network for taxi demand prediction

D Cao, K Zeng, J Wang, PK Sharma… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
Taxi demand prediction plays a significant role in assisting the pre-allocation of taxi
resources to avoid mismatches between demand and service, particularly in the era of the …

BAF-detector: An efficient CNN-based detector for photovoltaic cell defect detection

B Su, H Chen, Z Zhou - IEEE Transactions on Industrial …, 2021 - ieeexplore.ieee.org
The multiscale defect detection for photovoltaic (PV) cell electroluminescence (EL) images is
a challenging task, due to the feature vanishing as network deepens. To address this …

Transformers are minimax optimal nonparametric in-context learners

J Kim, T Nakamaki, T Suzuki - Advances in Neural …, 2025 - proceedings.neurips.cc
In-context learning (ICL) of large language models has proven to be a surprisingly effective
method of learning a new task from only a few demonstrative examples. In this paper, we …