[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Physics-informed machine learning

GE Karniadakis, IG Kevrekidis, L Lu… - Nature Reviews …, 2021 - nature.com
Despite great progress in simulating multiphysics problems using the numerical
discretization of partial differential equations (PDEs), one still cannot seamlessly incorporate …

Coatnet: Marrying convolution and attention for all data sizes

Z Dai, H Liu, QV Le, M Tan - Advances in neural information …, 2021 - proceedings.neurips.cc
Transformers have attracted increasing interests in computer vision, but they still fall behind
state-of-the-art convolutional networks. In this work, we show that while Transformers tend to …

Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD Conference …, 2022 - dl.acm.org
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

The impact of positional encoding on length generalization in transformers

A Kazemnejad, I Padhi… - Advances in …, 2024 - proceedings.neurips.cc
Length generalization, the ability to generalize from small training context sizes to larger
ones, is a critical challenge in the development of Transformer-based language models …

A review on the attention mechanism of deep learning

Z Niu, G Zhong, H Yu - Neurocomputing, 2021 - Elsevier
Attention has arguably become one of the most important concepts in the deep learning
field. It is inspired by the biological systems of humans that tend to focus on the distinctive …

Informer: Beyond efficient transformer for long sequence time-series forecasting

H Zhou, S Zhang, J Peng, S Zhang, J Li… - Proceedings of the …, 2021 - ojs.aaai.org
Many real-world applications require the prediction of long sequence time-series, such as
electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a …

Graph neural networks for natural language processing: A survey

L Wu, Y Chen, K Shen, X Guo, H Gao… - … and Trends® in …, 2023 - nowpublishers.com
Deep learning has become the dominant approach in addressing various tasks in Natural
Language Processing (NLP). Although text inputs are typically represented as a sequence …

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arxiv preprint arxiv …, 2020 - arxiv.org
We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

Fast attention requires bounded entries

J Alman, Z Song - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In modern machine learning, inner product attention computation is a fundamental task for
training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and …