- Academic Search

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier

Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Save Cite Cited by 1453 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Save Cite Cited by 479 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

H Wu, J Xu, J Wang, M Long - Advances in neural …, 2021 - proceedings.neurips.cc

Extending the forecasting time is a critical demand for real applications, such as extreme
weather early warning and long-term energy consumption planning. This paper studies the …

Save Cite Cited by 2306 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Coatnet: Marrying convolution and attention for all data sizes

Z Dai, H Liu, QV Le, M Tan - Advances in neural information …, 2021 - proceedings.neurips.cc

Transformers have attracted increasing interests in computer vision, but they still fall behind
state-of-the-art convolutional networks. In this work, we show that while Transformers tend to …

Save Cite Cited by 1445 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Roformer: Enhanced transformer with rotary position embedding

J Su, M Ahmed, Y Lu, S Pan, W Bo, Y Liu - Neurocomputing, 2024 - Elsevier

Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …

Save Cite Cited by 1791 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] jmlr.org

Exploring the limits of transfer learning with a unified text-to-text transformer

C Raffel, N Shazeer, A Roberts, K Lee, S Narang… - Journal of machine …, 2020 - jmlr.org

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-
tuned on a downstream task, has emerged as a powerful technique in natural language …

Save Cite Cited by 21039 Related articles All 15 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2023 - proceedings.neurips.cc

The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

Save Cite Cited by 217 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arxiv preprint arxiv …, 2020 - arxiv.org

We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

Save Cite Cited by 1767 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

Reformer: The efficient transformer

N Kitaev, Ł Kaiser, A Levskaya - arxiv preprint arxiv:2001.04451, 2020 - arxiv.org

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but
training these models can be prohibitively costly, especially on long sequences. We …

Save Cite Cited by 3074 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

A transformer-based framework for multivariate time series representation learning

G Zerveas, S Jayaraman, D Patel… - Proceedings of the 27th …, 2021 - dl.acm.org

We present a novel framework for multivariate time series representation learning based on
the transformer encoder architecture. The framework includes an unsupervised pre-training …

Save Cite Cited by 1071 Related articles All 12 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Music transformer

[HTML][HTML] A survey of transformers

Challenges and applications of large language models

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

Coatnet: Marrying convolution and attention for all data sizes

Roformer: Enhanced transformer with rotary position embedding

Exploring the limits of transfer learning with a unified text-to-text transformer

Scaling data-constrained language models

Rethinking attention with performers

Reformer: The efficient transformer

A transformer-based framework for multivariate time series representation learning