[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

H Wu, J Xu, J Wang, M Long - Advances in neural …, 2021 - proceedings.neurips.cc
Extending the forecasting time is a critical demand for real applications, such as extreme
weather early warning and long-term energy consumption planning. This paper studies the …

Coatnet: Marrying convolution and attention for all data sizes

Z Dai, H Liu, QV Le, M Tan - Advances in neural information …, 2021 - proceedings.neurips.cc
Transformers have attracted increasing interests in computer vision, but they still fall behind
state-of-the-art convolutional networks. In this work, we show that while Transformers tend to …

Roformer: Enhanced transformer with rotary position embedding

J Su, M Ahmed, Y Lu, S Pan, W Bo, Y Liu - Neurocomputing, 2024 - Elsevier
Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …

Exploring the limits of transfer learning with a unified text-to-text transformer

C Raffel, N Shazeer, A Roberts, K Lee, S Narang… - Journal of machine …, 2020 - jmlr.org
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-
tuned on a downstream task, has emerged as a powerful technique in natural language …

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2023 - proceedings.neurips.cc
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arxiv preprint arxiv …, 2020 - arxiv.org
We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

Reformer: The efficient transformer

N Kitaev, Ł Kaiser, A Levskaya - arxiv preprint arxiv:2001.04451, 2020 - arxiv.org
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but
training these models can be prohibitively costly, especially on long sequences. We …

A transformer-based framework for multivariate time series representation learning

G Zerveas, S Jayaraman, D Patel… - Proceedings of the 27th …, 2021 - dl.acm.org
We present a novel framework for multivariate time series representation learning based on
the transformer encoder architecture. The framework includes an unsupervised pre-training …