[HTML][HTML] A survey of transformers
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …
natural language processing, computer vision, and audio processing. Therefore, it is natural …
Challenges and applications of large language models
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting
Extending the forecasting time is a critical demand for real applications, such as extreme
weather early warning and long-term energy consumption planning. This paper studies the …
weather early warning and long-term energy consumption planning. This paper studies the …
Coatnet: Marrying convolution and attention for all data sizes
Transformers have attracted increasing interests in computer vision, but they still fall behind
state-of-the-art convolutional networks. In this work, we show that while Transformers tend to …
state-of-the-art convolutional networks. In this work, we show that while Transformers tend to …
Roformer: Enhanced transformer with rotary position embedding
Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …
enables valuable supervision for dependency modeling between elements at different …
Exploring the limits of transfer learning with a unified text-to-text transformer
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-
tuned on a downstream task, has emerged as a powerful technique in natural language …
tuned on a downstream task, has emerged as a powerful technique in natural language …
Scaling data-constrained language models
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
Rethinking attention with performers
We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …
Reformer: The efficient transformer
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but
training these models can be prohibitively costly, especially on long sequences. We …
training these models can be prohibitively costly, especially on long sequences. We …
A transformer-based framework for multivariate time series representation learning
We present a novel framework for multivariate time series representation learning based on
the transformer encoder architecture. The framework includes an unsupervised pre-training …
the transformer encoder architecture. The framework includes an unsupervised pre-training …